Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset for the paper "Large Language Models for Structuring and Integration of Heterogeneous Data" (add DOI).
It contains:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
the main contributions of this paper are threefold.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
the dataset can used for the test of models of deep learning which include structured data: stock price and unstructured data: stock bar posts. so, the dataset is Multi-source Heterogeneous Data.
Sound event detection (SED) task with heterogeneous datasets, including Domestic Environ-ment Sound Event Detection (DESED) and Multi-Annotator Estimated STROng labels (MAESTRO)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the data used for “Heterogeneous Multi-Source Data Fusion Through Input Mapping And Latent Variable Gaussian Process” paper by Yigitcan Comlek, Sandipp Krishnan Ravi, Piyush Pandita, Sayan Ghosh, Liping Wang, and Wei Chen. For all correspondence, please contact Dr. Wei Chen (weichen@northwestern.edu) or Dr. Sandipp Krishnan Ravi (sandippk@umich.edu).
Please use the below BibTex format to cite this work:
@article{comlek2024heterogenous,
title={Heterogenous Multi-Source Data Fusion Through Input Mapping and Latent Variable Gaussian Process},
author={Comlek, Yigitcan and Ravi, Sandipp Krishnan and Pandita, Piyush and Ghosh, Sayan and Wang, Liping and Chen, Wei},
journal={arXiv preprint arXiv:2407.11268},
year={2024}
}
The repository consists of data used in three case studies. All the data available is in .csv format. Each csv file contains the data for the specific source used in the case study. Below is a summary of the files for each of the three case studies.
Case Study 1 (Cantilever Beam)
· Source1_RectangularBeam.csv
· Source2_RectangularHollowBeam.csv
· Source3_CircularHollowBeam.csv
Case Study 2 (Ellipsoidal Void)
· Source1_2DEllipse.csv
· Source2_3DEllipse.csv
· Source3_3DEllipseRot.csv
Case Study 3 (Ti6AlV Alloys)
· Source1_LBPF.csv [1,2]
· Source2_EBM.csv [3]
· Source3_FSW.csv [4]
For this case study the data is collected from the below papers:
[1] Q. Luo, L. Yin, T. W. Simpson, and A. M. Beese, “Effect of processing parameters on pore structures, grain features, and mechanical properties in ti-6al-4v by laser powder bed fusion,” Additive Manufacturing, vol. 56, p. 102 915, 2022.
[2] Q. Luo, L. Yin, T. W. Simpson, and A. M. Beese, “Dataset of process-structure-property feature relationship for laser powder bed fusion additive manufactured ti-6al-4v material.,” Data in Brief, vol. 46, p. 108 911, 2023.
[3] J. Ran, F. Jiang, X. Sun, Z. Chen, C. Tian, and H. Zhao, “Microstructure and mechanical properties of ti-6al-4v fabricated by electron beam melting,” Crystals, vol. 10, no. 11, p. 972, 2020.
[4] A. Fall, M. Jahazi, A. Khdabandeh, and M. Fesharaki, “Effect of process parameters on microstructure and mechanical properties of friction stir-welded ti–6al–4v joints,” The International Journal of Advanced Manufacturing Technology, vol. 91, pp. 2919–2931, 2017
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Utilizing Richly Attributed Graphs to Reason from Heterogeneous Data - Part 1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and code to reproduce work in Learn2Link: Linking the Social and Academic Profiles of Researchers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recent advances in Computer Science and the spread of internet connection have allowed specialists to virtualize complex environments on the web and offer further information with realistic exploration experiences. At the same time, the fruition of complex geospatial datasets (point clouds, Building Information Modelling (BIM) models, 2D and 3D models) on the web is still a challenge, because usually it involves the usage of different proprietary software solutions, and the input data need further simplification for computational effort reduction. Moreover, integrating geospatial datasets acquired in different ways with various sensors remains a challenge. An interesting question, in that respect, is how to integrate 3D information in a 3D GIS (Geographic Information System) environment and manage different scales of information in the same application. Integrating a multiscale level of information is currently the first step when it comes to digital twinning. It is needed to properly manage complex urban datasets in digital twins related to the management of the buildings (cadastral management, prevention of natural and anthropogenic hazards, structure monitoring, etc.). Therefore, the current research shows the development of a freely accessible 3D Web navigation model based on open-source technology that allows the visualization of heterogeneous complex geospatial datasets in the same virtual environment. This solution employs JavaScript libraries based on WebGL technology. The model is accessible through web browsers and does not need software installation from the user side. The case study is the new building of the University of Twente-Faculty of Geo-Information (ITC), located in Enschede (the Netherlands). The developed solution allows switching between heterogeneous datasets (point clouds, BIM, 2D and 3D models) at different scales and visualization (indoor first-person navigation, outdoor navigation, urban navigation). This solution could be employed by governmental stakeholders or the private sector to remotely visualize complex datasets on the web in a unique visualization, and take decisions only based on open-source solutions. Furthermore, this system can incorporate underground data or real-time sensor data from the IoT (Internet of Things) for digital twinning tasks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains document data for a Systematic Literature Review (SLR) titled "Composition of Heterogeneous Web Services: A Systematic Review". Inclusion/exclusion decision and extracted data from the documents are included.
Three main types can be identified on the Web and on corporate networks: SOAP services, which use the homonym protocol and well established technologies, such as WSDL; RESTful services which employ HTTP directly and conform to the constraints of the REST architectural style; and event-oriented services that take the initiative in notifying their clients about relevant facts. The co-existence of these service types has brought considerable research interest on service type heterogeneity in Web Service composition. The research question of SLR is "How are services of heterogeneous types (SOAP, RESTful and \event-oriented services) composed?".
Documents that may answer this question were searched in Scopus and IEEE Xplore, from conferences and journal sources, without a time limit. Search results were last updated in July 22, 2018 and yielded 63 relevant documents published from 2005 to 2018.
Most works (48) target SOAP and RESTful services heterogeneity, followed by a smaller group of 18 targeting SOAP and event-oriented heterogeneity. The other two combinations, RESTful/event-oriented and SOAP/RESTful/event-oriented sum 5 documents. Among these documents, RESTful support was found to be incipient, with most documents violating constraints of the REST architectural style. The method used for heterogeneity support were classified in 7 archetypes: 1. Common description 2. Proxy 3. Middleware 4. Workflow language 5. Event processor 6. Automatic composition 7. Direct Implementation
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
current methods often rely on single-modal data and fail to effectively integrate multimodal information when representing node attributes.
We present a methodology for subtyping of persons with a common clinical symptom complex by integrating heterogeneous continuous and categorical data. We illustrate it by clustering women with lower urinary tract symptoms (LUTS), who represent a heterogeneous cohort with overlapping symptoms and multifactorial etiology. Data collected in the Symptoms of Lower Urinary Tract Dysfunction Research Network (LURN), a multi-center observational study, included self-reported urinary and non-urinary symptoms, bladder diaries, and physical examination data for 545 women. Heterogeneity in these multidimensional data required thorough and non-trivial preprocessing, including scaling by controls and weighting to mitigate data redundancy, while the various data types (continuous and categorical) required novel methodology using a weighted Tanimoto indices approach. Data domains only available on a subset of the cohort were integrated using a semi-supervised clustering approach. Novel contrast criteri...
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
c1_ wear.csv, c4_ Wear.csv and c6_ wear.csv are the open data set of tool wear experiment. The micro groove milling experiment was carried out on HSM600U high speed machining center. The tool used in the experiment is a vertical flat end milling cutter with a diameter of 800 microns. The cutting material is steel T4. Each cutting segment lasts for two seconds, and 100 cutting segments are conducted for each group of experiments. The cutting force signal is measured with a Kistler dynamometer, and the sampling frequency is 50KHz.Six different cutting conditions were used in the experiment. Experiments 1, 3 and 5 are used as training samples, and Experiments 2, 4 and 6 are used as test samples. After each milling, the tool wear data of each tooth is obtained by offline measurement. t_ tl_ 022619.fig and t_ RUL_ 022619.fig is the prediction result of the change of tool life and effective residual tool life with milling time respectively.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
As phylogenetic datasets have increased in size, site-heterogeneous substitution models such as CAT-F81 and CAT-GTR have been advocated in favor of other models because they purportedly suppress long-branch attraction (LBA). These models are two of the most commonly used models in phylogenomics, and they have been applied to a variety of taxa ranging from Drosophila to land plants. However, many arguments in favor of CAT models have been based on tenuous assumptions about the true phylogeny rather than rigorous testing with known trees via simulation. Moreover, CAT models have not been compared to other approaches for handling substitutional heterogeneity such as data partitioning with site-homogeneous substitution models. We simulated amino acid sequence datasets with substitutional heterogeneity on a variety of tree shapes including those susceptible to LBA. Data were analyzed with both CAT models and partitioning to explore model performance; in total over 670,000 CPU hours were used, of which over 97% was spent running analyses with CAT models. In many cases, all models recovered branching patterns that were identical to the known tree. However, CAT-F81 consistently performed worse than other models in inferring the correct branching patterns, and both CAT models often overestimated substitutional heterogeneity. Additionally, reanalysis of two empirical metazoan datasets supports the notion that CAT-F81 tends to recover less accurate trees than data partitioning and CAT-GTR. Given these results, we conclude that partitioning and CAT-GTR perform similarly in recovering accurate branching patterns. However, computation time can be orders of magnitude less for data partitioning, with commonly used implementations of CAT-GTR often failing to reach completion in a reasonable time frame (i.e., for Bayesian analyses to converge). Practices such as removing constant sites and parsimony uninformative characters, or using CAT-F81 when CAT-GTR is deemed too computationally expensive, cannot be logically justified. Given clear problems with CAT-F81, phylogenies previously inferred with this model should be reassessed.
Java application implementing dominance decomposition map matching. Traces and graph data used in experiments in the paper submitted to the ECML-PKDD 2016 conference. Datasets are under the OpenStreetMap licence ODbL. © OpenStreetMap contributors. The Guava library is under the Apache licence.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We develop a universal econometric formulation of empirical power laws possibly driven by parameter heterogeneity. Our approach extends classical extreme value theory to specifying the tail behavior of the empirical distribution of a general data set with possibly heterogeneous marginal distributions. We discuss several model examples that satisfy our conditions and demonstrate in simulations how heterogeneity may generate empirical power laws. We observe a cross-sectional power law for US stock losses and show that this tail behavior is largely driven by the heterogeneous volatilities of the individual assets.
The data are population sizes of yeast Saccharaomyces cerevisiae growth in laboratory cultures over a period of several days with different levels of growth inhibitor cycloheximide. Our results provide rigorous experimental tests of new and old theory, demonstrating how the traditional notion of carrying capacity is ambiguous for populations diffusing in spatially heterogeneous environments.
Adaptation to contrasting environments across a heterogeneous landscape favors the formation of ecotypes by promoting ecological divergence. Patterns of fitness variation in the field can show whether natural selection drives local adaptation and ecotype formation. However, to demonstrate a link between ecological divergence and speciation, local adaptation must have consequences for reproductive isolation. Using contrasting ecotypes of an Australian wildflower, Senecio lautus in common garden experiments, hybridization experiments, and reciprocal transplants, we assessed how the environment shapes patterns of adaptation and the consequences of adaptive divergence for reproductive isolation. Local adaptation was strong between ecotypes, but weaker between populations of the same ecotype. F1 hybrids exhibited heterosis, but crosses involving one native parent performed better than those with two foreign parents. In a common garden experiment, F2 hybrids exhibited reduced fitness compared...
Areas of heterogeneous integrity, present in Table 3 of the PTRC approved in 1992
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This provides the replication code and data for the paper "Estimating Heterogeneous Treatment Effects and the Effects of Heterogeneous Treatments with Ensemble Methods".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was generated during the Heterogeneous Wave Energy Converter (HetWECs) experimental campaign conducted at the O.H. Hinsdale Direction Wave Basin at Oregon State University. Experiments include system identification, hydrodynamics, and power take-off (PTO) tests. The experiments feature 4- and 5-body heterogenous WEC arrays consisting of both oscillating surge WECs and heaving point absorbers. Data was collected using Qualysis motion capture of the device motion, resistive wave gauges to capture wave height data at 20 locations throughout the basin, S-shaped load cells to measure wave excitation force and radiation force, and a Vesc 6 75 to measure motor current, motor RPMs, and FOC current. The submission includes post-processing MATLAB code is to support data handling and figure generation as well as test matrices detailing the sea state conditions for each experimental run.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset for the paper "Large Language Models for Structuring and Integration of Heterogeneous Data" (add DOI).
It contains: