100+ datasets found
  1. i

    A multi-source heterogeneous data monitoring method based on latent subspace...

    • ieee-dataport.org
    Updated Oct 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yunpeng Fan (2020). A multi-source heterogeneous data monitoring method based on latent subspace [Dataset]. https://ieee-dataport.org/documents/multi-source-heterogeneous-data-monitoring-method-based-latent-subspace
    Explore at:
    Dataset updated
    Oct 7, 2020
    Authors
    Yunpeng Fan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    the main contributions of this paper are threefold.

  2. Dataset for "Large Language Models for Structuring and Integration of...

    • zenodo.org
    pdf
    Updated Jan 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henrik Bongertmann; Benjamin Nast; Benjamin Nast; Leon Griesch; Leon Griesch; Henry Rotzoll; Kurt Sandkuhl; Kurt Sandkuhl; Henrik Bongertmann; Henry Rotzoll (2025). Dataset for "Large Language Models for Structuring and Integration of Heterogeneous Data" [Dataset]. http://doi.org/10.5281/zenodo.14779110
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 31, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Henrik Bongertmann; Benjamin Nast; Benjamin Nast; Leon Griesch; Leon Griesch; Henry Rotzoll; Kurt Sandkuhl; Kurt Sandkuhl; Henrik Bongertmann; Henry Rotzoll
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset for the paper "Large Language Models for Structuring and Integration of Heterogeneous Data" (add DOI).

    It contains:

    • Example documents (anonymized)
    • Comparison results of open-source LLMs
    • Additional material employed in the case study (e.g., prompt or JSON template)
    • Results of the case study
  3. Enhanced Stock Price Prediction with Optimized Ensemble Modeling Using...

    • figshare.com
    xlsx
    Updated Nov 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hongjiu Liu (2024). Enhanced Stock Price Prediction with Optimized Ensemble Modeling Using Multi-source Heterogeneous Data [Dataset]. http://doi.org/10.6084/m9.figshare.27328590.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 5, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Hongjiu Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    the dataset can used for the test of models of deep learning which include structured data: stock price and unstructured data: stock bar posts. so, the dataset is Multi-source Heterogeneous Data.

  4. Data from: Heterogeneous Multi-Source Data Fusion Through Input Mapping And...

    • zenodo.org
    bin, csv
    Updated Jan 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yigitcan Comlek; Yigitcan Comlek; Sandipp Krishnan Ravi; Sandipp Krishnan Ravi; Piyush Pandita; Sayan Ghosh; Liping Wang; Wei Chen; Piyush Pandita; Sayan Ghosh; Liping Wang; Wei Chen (2025). Heterogeneous Multi-Source Data Fusion Through Input Mapping And Latent Variable Gaussian Process [Dataset]. http://doi.org/10.5281/zenodo.14681801
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Jan 23, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yigitcan Comlek; Yigitcan Comlek; Sandipp Krishnan Ravi; Sandipp Krishnan Ravi; Piyush Pandita; Sayan Ghosh; Liping Wang; Wei Chen; Piyush Pandita; Sayan Ghosh; Liping Wang; Wei Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the data used for “Heterogeneous Multi-Source Data Fusion Through Input Mapping And Latent Variable Gaussian Process” paper by Yigitcan Comlek, Sandipp Krishnan Ravi, Piyush Pandita, Sayan Ghosh, Liping Wang, and Wei Chen. For all correspondence, please contact Dr. Wei Chen (weichen@northwestern.edu) or Dr. Sandipp Krishnan Ravi (sandippk@umich.edu).

    Please use the below BibTex format to cite this work:

    @article{comlek2024heterogenous,
     title={Heterogenous Multi-Source Data Fusion Through Input Mapping and Latent Variable Gaussian Process},
     author={Comlek, Yigitcan and Ravi, Sandipp Krishnan and Pandita, Piyush and Ghosh, Sayan and Wang, Liping and Chen, Wei},
     journal={arXiv preprint arXiv:2407.11268},
     year={2024}
    }

    The repository consists of data used in three case studies. All the data available is in .csv format. Each csv file contains the data for the specific source used in the case study. Below is a summary of the files for each of the three case studies.

    Case Study 1 (Cantilever Beam)

    · Source1_RectangularBeam.csv

    · Source2_RectangularHollowBeam.csv

    · Source3_CircularHollowBeam.csv

    Case Study 2 (Ellipsoidal Void)

    · Source1_2DEllipse.csv

    · Source2_3DEllipse.csv

    · Source3_3DEllipseRot.csv

    Case Study 3 (Ti6AlV Alloys)

    · Source1_LBPF.csv [1,2]

    · Source2_EBM.csv [3]

    · Source3_FSW.csv [4]

    For this case study the data is collected from the below papers:

    [1] Q. Luo, L. Yin, T. W. Simpson, and A. M. Beese, “Effect of processing parameters on pore structures, grain features, and mechanical properties in ti-6al-4v by laser powder bed fusion,” Additive Manufacturing, vol. 56, p. 102 915, 2022.

    [2] Q. Luo, L. Yin, T. W. Simpson, and A. M. Beese, “Dataset of process-structure-property feature relationship for laser powder bed fusion additive manufactured ti-6al-4v material.,” Data in Brief, vol. 46, p. 108 911, 2023.

    [3] J. Ran, F. Jiang, X. Sun, Z. Chen, C. Tian, and H. Zhao, “Microstructure and mechanical properties of ti-6al-4v fabricated by electron beam melting,” Crystals, vol. 10, no. 11, p. 972, 2020.

    [4] A. Fall, M. Jahazi, A. Khdabandeh, and M. Fesharaki, “Effect of process parameters on microstructure and mechanical properties of friction stir-welded ti–6al–4v joints,” The International Journal of Advanced Manufacturing Technology, vol. 91, pp. 2919–2931, 2017

  5. f

    Data from: Exploiting heterogeneous publicly available data sources for drug...

    • tandf.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vassilis G. Koutkias; Agnès Lillo-Le Louët; Marie-Christine Jaulent (2023). Exploiting heterogeneous publicly available data sources for drug safety surveillance: computational framework and case studies [Dataset]. http://doi.org/10.6084/m9.figshare.4286348.v2
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Vassilis G. Koutkias; Agnès Lillo-Le Louët; Marie-Christine Jaulent
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Objective: Driven by the need of pharmacovigilance centres and companies to routinely collect and review all available data about adverse drug reactions (ADRs) and adverse events of interest, we introduce and validate a computational framework exploiting dominant as well as emerging publicly available data sources for drug safety surveillance. Methods: Our approach relies on appropriate query formulation for data acquisition and subsequent filtering, transformation and joint visualization of the obtained data. We acquired data from the FDA Adverse Event Reporting System (FAERS), PubMed and Twitter. In order to assess the validity and the robustness of the approach, we elaborated on two important case studies, namely, clozapine-induced cardiomyopathy/myocarditis versus haloperidol-induced cardiomyopathy/myocarditis, and apixaban-induced cerebral hemorrhage. Results: The analysis of the obtained data provided interesting insights (identification of potential patient and health-care professional experiences regarding ADRs in Twitter, information/arguments against an ADR existence across all sources), while illustrating the benefits (complementing data from multiple sources to strengthen/confirm evidence) and the underlying challenges (selecting search terms, data presentation) of exploiting heterogeneous information sources, thereby advocating the need for the proposed framework. Conclusions: This work contributes in establishing a continuous learning system for drug safety surveillance by exploiting heterogeneous publicly available data sources via appropriate support tools.

  6. o

    Data And Software Associated With Phenostruct: Prediction Of Human Phenotype...

    • explore.openaire.eu
    • zenodo.org
    Updated Jun 19, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Indika Kahanda; Christopher Funk; Karin Verspoor; Asa Ben-Hur (2015). Data And Software Associated With Phenostruct: Prediction Of Human Phenotype Ontology Terms Using Heterogeneous Data Sources [Dataset]. http://doi.org/10.5281/zenodo.18764
    Explore at:
    Dataset updated
    Jun 19, 2015
    Authors
    Indika Kahanda; Christopher Funk; Karin Verspoor; Asa Ben-Hur
    Description

    Data and software associated with the paper: PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources

  7. f

    Data sources’ characteristics*.

    • plos.figshare.com
    xls
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Roberto; Ingrid Leal; Naveed Sattar; A. Katrina Loomis; Paul Avillach; Peter Egger; Rients van Wijngaarden; David Ansell; Sulev Reisberg; Mari-Liis Tammesoo; Helene Alavere; Alessandro Pasqua; Lars Pedersen; James Cunningham; Lara Tramontan; Miguel A. Mayer; Ron Herings; Preciosa Coloma; Francesco Lapi; Miriam Sturkenboom; Johan van der Lei; Martijn J. Schuemie; Peter Rijnbeek; Rosa Gini (2023). Data sources’ characteristics*. [Dataset]. http://doi.org/10.1371/journal.pone.0160648.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Giuseppe Roberto; Ingrid Leal; Naveed Sattar; A. Katrina Loomis; Paul Avillach; Peter Egger; Rients van Wijngaarden; David Ansell; Sulev Reisberg; Mari-Liis Tammesoo; Helene Alavere; Alessandro Pasqua; Lars Pedersen; James Cunningham; Lara Tramontan; Miguel A. Mayer; Ron Herings; Preciosa Coloma; Francesco Lapi; Miriam Sturkenboom; Johan van der Lei; Martijn J. Schuemie; Peter Rijnbeek; Rosa Gini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data sources’ characteristics*.

  8. Z

    Data from: PANACEA dataset - Heterogeneous COVID-19 Claims

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Jul 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kochkina, Elena (2022). PANACEA dataset - Heterogeneous COVID-19 Claims [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6493846
    Explore at:
    Dataset updated
    Jul 15, 2022
    Dataset provided by
    Liakata, Maria
    Zubiaga, Arkaitz
    Arana-Catania, Miguel
    He, Yulan
    Procter, Rob
    Kochkina, Elena
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The peer-reviewed publication for this dataset has been presented in the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), and can be accessed here: https://arxiv.org/abs/2205.02596. Please cite this when using the dataset.

    This dataset contains a heterogeneous set of True and False COVID claims and online sources of information for each claim.

    The claims have been obtained from online fact-checking sources, existing datasets and research challenges. It combines different data sources with different foci, thus enabling a comprehensive approach that combines different media (Twitter, Facebook, general websites, academia), information domains (health, scholar, media), information types (news, claims) and applications (information retrieval, veracity evaluation).

    The processing of the claims included an extensive de-duplication process eliminating repeated or very similar claims. The dataset is presented in a LARGE and a SMALL version, accounting for different degrees of similarity between the remaining claims (excluding respectively claims with a 90% and 99% probability of being similar, as obtained through the MonoT5 model). The similarity of claims was analysed using BM25 (Robertson et al., 1995; Crestani et al., 1998; Robertson and Zaragoza, 2009) with MonoT5 re-ranking (Nogueira et al., 2020), and BERTScore (Zhang et al., 2019).

    The processing of the content also involved removing claims making only a direct reference to existing content in other media (audio, video, photos); automatically obtained content not representing claims; and entries with claims or fact-checking sources in languages other than English.

    The claims were analysed to identify types of claims that may be of particular interest, either for inclusion or exclusion depending on the type of analysis. The following types were identified: (1) Multimodal; (2) Social media references; (3) Claims including questions; (4) Claims including numerical content; (5) Named entities, including: PERSON − People, including fictional; ORGANIZATION − Companies, agencies, institutions, etc.; GPE − Countries, cities, states; FACILITY − Buildings, highways, etc. These entities have been detected using a RoBERTa base English model (Liu et al., 2019) trained on the OntoNotes Release 5.0 dataset (Weischedel et al., 2013) using Spacy.

    The original labels for the claims have been reviewed and homogenised from the different criteria used by each original fact-checker into the final True and False labels.

    The data sources used are:

    The LARGE dataset contains 5,143 claims (1,810 False and 3,333 True), and the SMALL version 1,709 claims (477 False and 1,232 True).

    The entries in the dataset contain the following information:

    • Claim. Text of the claim.

    • Claim label. The labels are: False, and True.

    • Claim source. The sources include mostly fact-checking websites, health information websites, health clinics, public institutions sites, and peer-reviewed scientific journals.

    • Original information source. Information about which general information source was used to obtain the claim.

    • Claim type. The different types, previously explained, are: Multimodal, Social Media, Questions, Numerical, and Named Entities.

    Funding. This work was supported by the UK Engineering and Physical Sciences Research Council (grant no. EP/V048597/1, EP/T017112/1). ML and YH are supported by Turing AI Fellowships funded by the UK Research and Innovation (grant no. EP/V030302/1, EP/V020579/1).

    References

    • Arana-Catania M., Kochkina E., Zubiaga A., Liakata M., Procter R., He Y.. Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims. NAACL 2022 https://arxiv.org/abs/2205.02596

    • Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at trec-3. Nist Special Publication Sp,109:109.

    • Fabio Crestani, Mounia Lalmas, Cornelis J Van Rijsbergen, and Iain Campbell. 1998. “is this document relevant?. . . probably” a survey of probabilistic models in information retrieval. ACM Computing Surveys (CSUR), 30(4):528–552.

    • Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc.

    • Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document ranking with a pre-trained sequence-to-sequence model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 708–718.

    • Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.

    • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

    • Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, et al. 2013. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA, 23.

    • Limeng Cui and Dongwon Lee. 2020. Coaid: Covid-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885.

    • Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. 2020. Mm-covid: A multilingual and multimodal data repository for combating covid-19 disinformation.

    • Tamanna Hossain, Robert L. Logan IV, Arjuna Ugarte, Yoshitomo Matsubara, Sean Young, and Sameer Singh. 2020. COVIDLies: Detecting COVID-19 misinformation on social media. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Online. Association for Computational Linguistics.

    • Ellen Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, and Lucy Lu Wang. 2021. Trec-covid: constructing a pandemic information retrieval test collection. In ACM SIGIR Forum, volume 54, pages 1–12. ACM New York, NY, USA.

  9. g

    Towards Digital Twinning on the Web: Heterogeneous 3D Data Fusion Based on...

    • eleonasrepo.getmap.gr
    Updated Jan 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Towards Digital Twinning on the Web: Heterogeneous 3D Data Fusion Based on Open-Source Structure - Datasets - eLeonas Data Hub [Dataset]. https://eleonasrepo.getmap.gr/dataset/towards-digital-twinning-on-the-web-heterogeneous-3d-data-fusion-based-on-open-source-structure
    Explore at:
    Dataset updated
    Jan 26, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recent advances in Computer Science and the spread of internet connection have allowed specialists to virtualize complex environments on the web and offer further information with realistic exploration experiences. At the same time, the fruition of complex geospatial datasets (point clouds, Building Information Modelling (BIM) models, 2D and 3D models) on the web is still a challenge, because usually it involves the usage of different proprietary software solutions, and the input data need further simplification for computational effort reduction. Moreover, integrating geospatial datasets acquired in different ways with various sensors remains a challenge. An interesting question, in that respect, is how to integrate 3D information in a 3D GIS (Geographic Information System) environment and manage different scales of information in the same application. Integrating a multiscale level of information is currently the first step when it comes to digital twinning. It is needed to properly manage complex urban datasets in digital twins related to the management of the buildings (cadastral management, prevention of natural and anthropogenic hazards, structure monitoring, etc.). Therefore, the current research shows the development of a freely accessible 3D Web navigation model based on open-source technology that allows the visualization of heterogeneous complex geospatial datasets in the same virtual environment. This solution employs JavaScript libraries based on WebGL technology. The model is accessible through web browsers and does not need software installation from the user side. The case study is the new building of the University of Twente-Faculty of Geo-Information (ITC), located in Enschede (the Netherlands). The developed solution allows switching between heterogeneous datasets (point clouds, BIM, 2D and 3D models) at different scales and visualization (indoor first-person navigation, outdoor navigation, urban navigation). This solution could be employed by governmental stakeholders or the private sector to remotely visualize complex datasets on the web in a unique visualization, and take decisions only based on open-source solutions. Furthermore, this system can incorporate underground data or real-time sensor data from the IoT (Internet of Things) for digital twinning tasks.

  10. E

    Data from: Integration and harmonization of trait data from plant...

    • live.european-language-grid.eu
    • zenodo.org
    • +1more
    csv
    Updated Dec 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Data from: Integration and harmonization of trait data from plant individuals across heterogeneous sources [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/7662
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 13, 2023
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Trait data represent the basis for ecological and evolutionary research and have relevance for biodiversity conservation, ecosystem management and earth system modelling. The collection and mobilization of trait data has strongly increased over the last decade, but many trait databases still provide only species-level, aggregated trait values (e.g. ranges, means) and lack the direct observations on which those data are based. Thus, the vast majority of trait data measured directly from individuals remains hidden and highly heterogeneous, impeding their discoverability, semantic interoperability, digital accessibility and (re-)use. Here, we integrate quantitative measurements of verbatim trait information from plant individuals (e.g. lengths, widths, counts and angles of stems, leaves, fruits and inflorescence parts) from multiple sources such as field observations and herbarium collections. We develop a workflow to harmonize heterogeneous trait measurements (e.g. trait names and their values and units) as well as additional information related to taxonomy, measurement or fact and occurrence. This data integration and harmonization builds on vocabularies and terminology from existing metadata standards and ontologies such as the Ecological Trait-data Standard (ETS), the Darwin Core (DwC), the Thesaurus Of Plant characteristics (TOP) and the Plant Trait Ontology (TO). A metadata form filled out by data providers enables the automated integration of trait information from heterogeneous datasets. We illustrate our tools with data from palms (family Arecaceae), a globally distributed (pantropical), diverse plant family that is considered a good model system for understanding the ecology and evolution of tropical rainforests. We mobilize nearly 140,000 individual palm trait measurements in an interoperable format, identify semantic gaps in existing plant trait terminology and provide suggestions for the future development of a thesaurus of plant characteristics. Our work thereby promotes the semantic integration of plant trait data in a machine-readable way and shows how large amounts of small trait data sets and their metadata can be integrated into standardized data products.

  11. Z

    Data from: Heterogeneous Integrated Dataset for Maritime Intelligence,...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DRÉO, Richard (2020). Heterogeneous Integrated Dataset for Maritime Intelligence, Surveillance, and Reconnaissance [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1167594
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    DRÉO, Richard
    RAY, Cyril
    CAMOSSI, Elena
    JOUSSELME, Anne-Laure
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Facing an increasing amount of movements at sea and daily impacts on ships, crew and our global ecosystem, many research centers, international organizations, industrials have favored and developed sensors, detection techniques for the monitoring, analysis and visualization of sea movements. Automatic Identification System (AIS) is one of these electronic systems that enable ships to broadcast their dynamic (position, speed, destination...) and static (name, type, international identifier…) information via radio communications.

    Having spatially and temporally aligned maritime dataset relying not only on ships' positions but also on a variety of complementary data sources is of great interest for the understanding of maritime activities and their impact on the environment.

    This dataset contains ships' information collected though the Automatic Identification System, integrated with a set of complementary data having spatial and temporal dimensions aligned. The dataset contains four categories of data: Navigation data, vessel-oriented data, geographic data, and environmental data. It covers a time span of six months, from October 1st, 2015 to March 31st, 2016 and provides ships positions within Celtic sea, the Channel and Bay of Biscay (France). The dataset is proposed with predefined integration and querying principles for relational databases. These rely on the widespread and free relational database management system PostgreSQL, with the adjunction of the PostGIS extension, for the treatment of all spatial features proposed in the dataset.

  12. E

    Data for "An Open Source, Heterogeneous, Nonlinear Optics Simulation"

    • edmond.mpg.de
    • b2find.eudat.eu
    zip
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicholas Karpowicz; Nicholas Karpowicz (2023). Data for "An Open Source, Heterogeneous, Nonlinear Optics Simulation" [Dataset]. http://doi.org/10.17617/3.HB2GJI
    Explore at:
    zip(7723892), zip(506240933)Available download formats
    Dataset updated
    Sep 15, 2023
    Dataset provided by
    Edmond
    Authors
    Nicholas Karpowicz; Nicholas Karpowicz
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains the simulation inputs and outputs for the manuscript "An Open Source, Heterogeneous, Nonlinear Optics Simulation" in Optics Continuum

  13. S

    A Dataset for 3D Geo-Modeling Framework for Multisource Heterogeneous Data...

    • scidb.cn
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hengguang Liu; Shaohong Xia; Chaoyan Fan; Changrong Zhang (2024). A Dataset for 3D Geo-Modeling Framework for Multisource Heterogeneous Data Fusion Based on Multimodal Deep Learning and Multipoint Statistics: A case study in South China Sea [Dataset]. http://doi.org/10.57760/sciencedb.12953
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Science Data Bank
    Authors
    Hengguang Liu; Shaohong Xia; Chaoyan Fan; Changrong Zhang
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    China, South China Sea
    Description

    Relying on geological data to construct 3D models can provide a more intuitive and easily comprehensible spatial perspective. This process aids in exploring underground spatial structures and geological evolutionary processes, providing essential data and assistance for the exploration of geological resources, energy development, engineering decision-making, and various other applications. As one of the methods for 3D geological modeling, multipoint statistics can effectively describe and reconstruct the intricate geometric shapes of nonlinear geological bodies. However, existing multipoint statistics algorithms still face challenges in efficiently extracting and reconstructing the global spatial distribution characteristics of geological objects. Moreover, they lack a data-driven modeling framework that integrates diverse sources of heterogeneous data. This research introduces a novel approach that combines multipoint statistics with multimodal deep artificial neural networks and constructs the 3D crustal P-wave velocity structure model of the South China Sea by using 44 OBS forward profiles, gravity anomalies, magnetic anomalies and topographic relief data. The experimental results demonstrate that the new approach surpasses multipoint statistics and Kriging interpolation methods, and can generate a more accurate 3D geological model through the integration of multiple geophysical data. Furthermore, the reliability of the 3D crustal P-wave velocity structure model, established using the novel method, was corroborated through visual and statistical analyses. This model intuitively delineates the spatial distribution characteristics of the crustal velocity structure in the South China Sea, thereby offering a foundational data basis for researchers to gain a more comprehensive understanding of the geological evolution process within this region.

  14. Learning from Heterogeneous Data Sources: An Application in Spatial...

    • plos.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lisa M. Breckels; Sean B. Holden; David Wojnar; Claire M. Mulvey; Andy Christoforou; Arnoud Groen; Matthew W. B. Trotter; Oliver Kohlbacher; Kathryn S. Lilley; Laurent Gatto (2023). Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics [Dataset]. http://doi.org/10.1371/journal.pcbi.1004920
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lisa M. Breckels; Sean B. Holden; David Wojnar; Claire M. Mulvey; Andy Christoforou; Arnoud Groen; Matthew W. B. Trotter; Oliver Kohlbacher; Kathryn S. Lilley; Laurent Gatto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis.

  15. o

    Resources of IncRML: Incremental Knowledge Graph Construction from...

    • explore.openaire.eu
    • zenodo.org
    Updated Mar 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dylan Van Assche; Julian Andres Rojas Melendez; Ben De Meester; Pieter Colpaert (2024). Resources of IncRML: Incremental Knowledge Graph Construction from Heterogeneous Data Sources [Dataset]. http://doi.org/10.5281/zenodo.10171157
    Explore at:
    Dataset updated
    Mar 18, 2024
    Authors
    Dylan Van Assche; Julian Andres Rojas Melendez; Ben De Meester; Pieter Colpaert
    Description

    IncRML resources This Zenodo dataset contains all the resources of the paper 'IncRML: Incremental Knowledge Graph Construction from Heterogeneous Data Sources' submitted to the Semantic Web Journal's Special Issue on Knowledge Graph Construction. This resource aims to make the paper experiments fully reproducible through our experiment tool written in Python which was already used before in the Knowledge Graph Construction Challenge by the ESWC 2023 Workshop on Knowledge Graph Construction. The exact Java JAR file of the RMLMapper (rmlmapper.jar) is also provided in this dataset which was used to execute the experiments. This JAR file was executed with Java OpenJDK 11.0.20.1 on Ubuntu 22.04.1 LTS (Linux 5.15.0-53-generic). Each experiment was executed 5 times and the median values are reported together with the standard deviation of the measurements. Datasets We provide both dataset dumps of the GTFS-Madrid-Benchmark and of real-life use cases from Open Data in Belgium.GTFS-Madrid-Benchmark dumps are used to analyze the impact on execution time and resources, while the real-life use cases aim to verify the approach on different types of datasets since the GTFS-Madrid-Benchmark is a single type of dataset which does not advertise changes at all. Benchmarks GTFS-Madrid-Benchmark: change types with fixed data size and amount of changes: additions-only, modifications-only, deletions-only (11 versions) GTFS-Madrid-Benchmark: amount of changes with fixed data size: 0%, 25%, 50%, 75%, and 100% changes (11 versions) GTFS-Madrid-Benchmark: data size with fixed amount of changes: scales 1, 10, 100 (11 versions) Real-life use cases Traffic control center Vlaams Verkeerscentrum (Belgium): traffic board messages data (1 day, 28760 versions) Meteorological institute KMI (Belgium): weather sensor data (1 day, 144 versions) Public transport agency NMBS (Belgium): train schedule data (1 week, 7 versions) Public transport agency De Lijn (Belgium): busses schedule data (1 week, 7 versions) Bike-sharing company BlueBike (Belgium): bike-sharing availability data (1 day, 1440 versions) Bike-sharing company JCDecaux (EU): bike-sharing availability data (1 day, 1440 versions) OpenStreetMap (World): geographical map data (1 day, 1440 versions) Remarks The first version of each dataset is always used as a baseline. All next versions are applied as an update on the existing version. The reported results are only focusing on the updates since these are the actual incremental generation. GTFS-Change-50_percent-{ALL, CHANGE}.tar.xz datasets are not uploaded as GTFS-Madrid-Benchmark scale 100 because both share the same parameters (50% changes, scale 100). Please use GTFS-Scale-100-{ALL, CHANGE}.tar.xz for GTFS-Change-50_percent-{ALL, CHANGE}.tar.xz All datasets are compressed with XZ and provided as a TAR archive, be aware that you need sufficient space to decompress these archives! 2 TB of free space is advised to decompress all benchmarks and use cases. The expected output is provided as a ZIP file in each TAR archive, decompressing these requires even more space (4 TB). Reproducing By using our experiment tool, you can easily reproduce the experiments as followed: Download one of the TAR.XZ archives and unpack them. Clone the GitHub repository of our experiment tool and install the Python dependencies with 'pip install -r requirements.txt'. Download the rmlmapper.jar JAR file from this Zenodo dataset and place it inside the experiment tool root folder. Execute the tool by running: './exectool --root=/path/to/the/root/of/the/tarxz/archive --runs=5 run'. The argument '--runs=5' is used to perform the experiment 5 times. Once executed, you can generate the statistics by running: './exectool --root=/path/to/the/root/of/the/tarxz/archive stats'. Testcases Testcases to verify the integration of RML and LDES with IncRML, see https://doi.org/10.5281/zenodo.10171394

  16. Component algorithms description.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Roberto; Ingrid Leal; Naveed Sattar; A. Katrina Loomis; Paul Avillach; Peter Egger; Rients van Wijngaarden; David Ansell; Sulev Reisberg; Mari-Liis Tammesoo; Helene Alavere; Alessandro Pasqua; Lars Pedersen; James Cunningham; Lara Tramontan; Miguel A. Mayer; Ron Herings; Preciosa Coloma; Francesco Lapi; Miriam Sturkenboom; Johan van der Lei; Martijn J. Schuemie; Peter Rijnbeek; Rosa Gini (2023). Component algorithms description. [Dataset]. http://doi.org/10.1371/journal.pone.0160648.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Giuseppe Roberto; Ingrid Leal; Naveed Sattar; A. Katrina Loomis; Paul Avillach; Peter Egger; Rients van Wijngaarden; David Ansell; Sulev Reisberg; Mari-Liis Tammesoo; Helene Alavere; Alessandro Pasqua; Lars Pedersen; James Cunningham; Lara Tramontan; Miguel A. Mayer; Ron Herings; Preciosa Coloma; Francesco Lapi; Miriam Sturkenboom; Johan van der Lei; Martijn J. Schuemie; Peter Rijnbeek; Rosa Gini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Component algorithms description.

  17. u

    Compilation of climate data from heterogeneous networks across the Hawaiian...

    • agdatacommons.nal.usda.gov
    bin
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan J. Longman; Thomas W. Giambelluca; Michael A. Nullet; Abby G. Frazier; Kevin Kodama; Shelley D. Crausbay; Paul D. Krushelnycky; Susan Cordell; Martyn P. Clark; Andy J. Newman; Jeffrey R. Arnold (2025). Compilation of climate data from heterogeneous networks across the Hawaiian Islands [Dataset]. http://doi.org/10.1038/sdata.2018.12
    Explore at:
    binAvailable download formats
    Dataset updated
    May 6, 2025
    Dataset provided by
    Scientific Data
    Authors
    Ryan J. Longman; Thomas W. Giambelluca; Michael A. Nullet; Abby G. Frazier; Kevin Kodama; Shelley D. Crausbay; Paul D. Krushelnycky; Susan Cordell; Martyn P. Clark; Andy J. Newman; Jeffrey R. Arnold
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Hawaiian Islands, Hawaii
    Description

    Long-term, accurate observations of atmospheric phenomena are essential for a myriad of applications, including historic and future climate assessments, resource management, and infrastructure planning. In Hawai‘i, climate data are available from individual researchers, local, State, and Federal agencies, and from large electronic repositories such as the National Centers for Environmental Information (NCEI). Researchers attempting to make use of available data are faced with a series of challenges that include: (1) identifying potential data sources; (2) acquiring data; (3) establishing data quality assurance and quality control (QA/QC) protocols; and (4) implementing robust gap filling techniques. This paper addresses these challenges by providing: (1) a summary of the available climate data in Hawai‘i including a detailed description of the various meteorological observation networks and data accessibility, and (2) a quality-controlled meteorological dataset across the Hawaiian Islands for the 25-year period 1990-2014. The dataset draws on observations from 471 climate stations and includes rainfall, maximum and minimum surface air temperature, relative humidity, wind speed, downward shortwave and longwave radiation data. Resources in this dataset:Resource Title: Compilation of climate data from heterogeneous networks across the Hawaiian Islands. File Name: Web Page, url: https://figshare.com/collections/Compilation_of_climate_data_from_heterogeneous_networks_across_the_Hawaiian_Islands/3858208 https://doi.org/10.6084/m9.figshare.c.3858208 includes the following 12 datasets:

    List of Active and Discontinued Climate Stations in Hawaii

    Daily Downwelling Longwave Radiation in Hawaii

    Daily Incoming Solar Radiation in Hawaii

    Daily Wind Speed in Hawaii

    Daily Relative Humidity Data in Hawaii

    Daily Minimum Temperature Data in Hawaii

    Daily Minimum Temperature Data in Hawaii (partially gap filled)

    Daily Maximum Temperature in Hawaii

    Daily Maximum Temperature Data in Hawaii (partially gap filled)

    Daily Rainfall Data in Hawaii

    Daily Rainfall Data in Hawaii (partially gap filled)

    Column Headers for all Data Files

  18. i

    Data from: Monitoring System of the Mar Menor Coastal Lagoon (Spain) and Its...

    • pre.iepnb.es
    Updated Nov 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Monitoring System of the Mar Menor Coastal Lagoon (Spain) and Its Watershed Basin Using the Integration of Massive Heterogeneous Data [Dataset]. https://pre.iepnb.es/catalogo/dataset/monitoring-system-of-the-mar-menor-coastal-lagoon-spain-and-its-watershed-basin-using-the-integ
    Explore at:
    Dataset updated
    Nov 5, 2024
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Mar Menor, Spain
    Description

    The tool created aims at the environmental monitoring of the Mar Menor coastal lagoon (Spain) and the monitoring of the land use of its watershed. It integrates heterogeneous data sources ranging from ecological data obtained from a multiparametric oceanographic sonde to agro-meteorological data from IMIDA's network of stations or hydrological data from the SAIH network as multispectral satellite images from Sentinel and Landsat space missions. The system is based on free and open source software and has been designed to guarantee maximum levels of flexibility and scalability and minimum coupling so that the incorporation of new components does not affect the existing ones. The platform is designed to handle a data volume of more than 12 million records, experiencing exponential growth in the last six months. The tool allows the transformation of a large volume of data into information, offering them through microservices with optimal response times. As practical applications, the platform created allows us to know the ecological state of the Mar Menor with a very high level of detail, both at biophysical and nutrient levels, being able to detect periods of oxygen deficit and delimit the affected area. In addition, it facilitates the detailed monitoring of the cultivated areas of the watershed, detecting the agricultural use and crop cycles at the plot level. It also makes it possible to calculate the amount of water precipitated on the watershed and to monitor the runoff produced and the amount of water entering the Mar Menor in extreme events. The information is offered in different ways depending on the user profile, offering a very high level of detail for research or data analysis profiles, concrete and direct information to support decision-making for users with managerial profiles and validated and concise information for citizens. It is an integrated and distributed system that will provide data and services for the Mar Menor Observatory.

  19. Z

    Heterogeneous/Homogeneous Change Detection dataset

    • data.niaid.nih.gov
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Alejandro Jimenez Sierra (2023). Heterogeneous/Homogeneous Change Detection dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8269854
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Hernán Darío Benítez Restrepo
    Joceyn Chanussot
    Behnood Rasti
    David Alejandro Jimenez Sierra
    Juan Felipe Florez Ospina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    "Please if you use this datasets we appreciated that you reference this repository and cite the works related that made possible the generation of this dataset." This change detection datastet has different events, satellites, resolutions and includes both homogeneous/heterogeneous cases. The main idea of the dataset is to bring a benchmark on semantic change detection in remote sensing field.This dataset is the outcome of the following publications:

    @article{ JimenezSierra2022graph,author={Jimenez-Sierra, David Alejandro and Quintero-Olaya, David Alfredo and Alvear-Mu{~n}oz, Juan Carlos and Ben{\'i}tez-Restrepo, Hern{\'a}n Dar{\'i}o and Florez-Ospina, Juan Felipe and Chanussot, Jocelyn},journal={IEEE Transactions on Geoscience and Remote Sensing},title={Graph Learning Based on Signal Smoothness Representation for Homogeneous and Heterogeneous Change Detection},year={2022},volume={60},number={},pages={1-16},doi={10.1109/TGRS.2022.3168126}} @article{ JimenezSierra2020graph,title={Graph-Based Data Fusion Applied to: Change Detection and Biomass Estimation in Rice Crops},author={Jimenez-Sierra, David Alejandro and Ben{\'i}tez-Restrepo, Hern{\'a}n Dar{\'i}o and Vargas-Cardona, Hern{\'a}n Dar{\'i}o and Chanussot, Jocelyn},journal={Remote Sensing},volume={12},number={17},pages={2683},year={2020},publisher={Multidisciplinary Digital Publishing Institute},doi={10.3390/rs12172683}} @inproceedings{jimenez2021blue,title={Blue noise sampling and Nystrom extension for graph based change detection},author={Jimenez-Sierra, David Alejandro and Ben{\'\i}tez-Restrepo, Hern{\'a}n Dar{\'\i}o and Arce, Gonzalo R and Florez-Ospina, Juan F},booktitle={2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS},ages={2895--2898},year={2021},organization={IEEE},doi={10.1109/IGARSS47720.2021.9555107}} @article{florez2023exploiting,title={Exploiting variational inequalities for generalized change detection on graphs},author={Florez-Ospina, Juan F and Jimenez Sierra, David A and Benitez-Restrepo, Hernan D and Arce, Gonzalo},journal={IEEE Transactions on Geoscience and Remote Sensing}, year={2023},volume={61},number={},pages={1-16},doi={10.1109/TGRS.2023.3322377}} @article{florez2023exploitingxiv,title={Exploiting variational inequalities for generalized change detection on graphs},author={Florez-Ospina, Juan F. and Jimenez-Sierra, David A. and Benitez-Restrepo, Hernan D. and Arce, Gonzalo R},year={2023},publisher={TechRxiv},doi={10.36227/techrxiv.23295866.v1}} In the table on the html file (dataset_table.html) are tabulated all the metadata and details related to each case within the dasetet. The cases with a link, were gathered from those sources and authors, therefore you should refer to their work as well. The rest of the cases or events (without a link), were obtained through the use of open sources such as:

    Copernicus European Space Agency Alaska Satellite Facility (Vertex) Earth Data In addition, we carried out all the processing of the images by using the SNAP toolbox from the European Space Agency. This proccessing involves the following:

    Data co-registration Cropping Apply Orbit (for SAR data) Calibration (for SAR data) Speckle Filter (for SAR data) Terrain Correction (for SAR data) Lastly, the ground truth was obtained from homogeneous images for pre/post events by drawing polygons to highlight the areas where a visible change was present. The images where layout and synchorized to be zoomed over the same are to have a better view of changes. This was an exhaustive work in order to be precise as possible.Feel free to improve and contribute to this dataset.

  20. d

    Data from: Depletion of heterogeneous source species pools predicts future...

    • datadryad.org
    zip
    Updated Feb 23, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew M. Liebhold; Eckehard G. Brockerhoff; Mark Kimberley (2018). Depletion of heterogeneous source species pools predicts future invasion rates [Dataset]. http://doi.org/10.5061/dryad.v01m0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 23, 2018
    Dataset provided by
    Dryad
    Authors
    Andrew M. Liebhold; Eckehard G. Brockerhoff; Mark Kimberley
    Time period covered
    Feb 22, 2017
    Area covered
    Asia, USA, Europe
    Description

    Europe_Asia_establishments_DryadHistorical numbers of European and Asian Scolytinae established in USA by decade

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yunpeng Fan (2020). A multi-source heterogeneous data monitoring method based on latent subspace [Dataset]. https://ieee-dataport.org/documents/multi-source-heterogeneous-data-monitoring-method-based-latent-subspace

A multi-source heterogeneous data monitoring method based on latent subspace

Explore at:
Dataset updated
Oct 7, 2020
Authors
Yunpeng Fan
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

the main contributions of this paper are threefold.

Search
Clear search
Close search
Google apps
Main menu