100+ datasets found

d
Making Predictions using Large Scale Gaussian Processes
catalog.data.gov
s.cnmilf.com
+1more
Updated Aug 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Making Predictions using Large Scale Gaussian Processes [Dataset]. https://catalog.data.gov/dataset/making-predictions-using-large-scale-gaussian-processes
Explore at:
Dataset updated
Aug 22, 2025
Dataset provided by
Dashlink
Description
One of the key problems that arises in many areas is to estimate a potentially nonlinear function [tex] G(x, \theta)[/tex] given input and output samples tex [/tex] so that [tex]y approx G(x, \theta)[/tex]. There are many approaches to addressing this regression problem. Neural networks, regression trees, and many other methods have been developed to estimate [tex]$G$[/tex] given the input output pair tex [/tex]. One method that I have worked with is called Gaussian process regression. There many good texts and papers on the subject. For more technical information on the method and its applications see: http://www.gaussianprocess.org/ A key problem that arises in developing these models on very large data sets is that it ends up requiring an [tex]O(N^3)[/tex] computation where N is the number of data points and the training sample. Obviously this becomes very problematic when N is large. I discussed this problem with Leslie Foster, a mathematics professor at San Jose State University. He, along with some of his students, developed a method to address this problem based on Cholesky decomposition and pivoting. He also shows that this leads to a numerically stable result. If ou're interested in some light reading, I’d suggest you take a look at his recent paper (which was accepted in the Journal of Machine Learning Research) posted on dashlink. We've also posted code for you to try it out. Let us know how it goes. If you are interested in applications of this method in the area of prognostics, check out our new paper on the subject which was published in IEEE Transactions on Systems, Man, and Cybernetics.
d
Data from: A Review of International Large-Scale Assessments in Education...
catalog.data.gov
datasets.ai
Updated Mar 30, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of State (2021). A Review of International Large-Scale Assessments in Education Assessing Component Skills and Collecting Contextual Data [Dataset]. https://catalog.data.gov/dataset/a-review-of-international-large-scale-assessments-in-education-assessing-component-skills-
Explore at:
Dataset updated
Mar 30, 2021
Dataset provided by
U.S. Department of State
Description
The OECD has initiated PISA for Development (PISA-D) in response to the rising need of developing countries to collect data about their education systems and the capacity of their student bodies. This report aims to compare and contrast approaches regarding the instruments that are used to collect data on (a) component skills and cognitive instruments, (b) contextual frameworks, and (c) the implementation of the different international assessments, as well as approaches to include children who are not at school, and the ways in which data are used. It then seeks to identify assessment practices in these three areas that will be useful for developing countries. This report reviews the major international and regional large-scale educational assessments: large-scale international surveys, school-based surveys and household-based surveys. For each of the issues discussed, there is a description of the prevailing international situation, followed by a consideration of the issue for developing countries and then a description of the relevance of the issue to PISA for Development.
f
Data from: Large-Scale Learning of Structure−Activity Relationships Using a...
acs.figshare.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georg Hinselmann; Lars Rosenbaum; Andreas Jahn; Nikolas Fechner; Claude Ostermann; Andreas Zell (2023). Large-Scale Learning of Structure−Activity Relationships Using a Linear Support Vector Machine and Problem-Specific Metrics [Dataset]. http://doi.org/10.1021/ci100073w.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/ci100073w.s001
Dataset updated
May 30, 2023
Dataset provided by
ACS Publications
Authors
Georg Hinselmann; Lars Rosenbaum; Andreas Jahn; Nikolas Fechner; Claude Ostermann; Andreas Zell
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The goal of this study was to adapt a recently proposed linear large-scale support vector machine to large-scale binary cheminformatics classification problems and to assess its performance on various benchmarks using virtual screening performance measures. We extended the large-scale linear support vector machine library LIBLINEAR with state-of-the-art virtual high-throughput screening metrics to train classifiers on whole large and unbalanced data sets. The formulation of this linear support machine has an excellent performance if applied to high-dimensional sparse feature vectors. An additional advantage is the average linear complexity in the number of non-zero features of a prediction. Nevertheless, the approach assumes that a problem is linearly separable. Therefore, we conducted an extensive benchmarking to evaluate the performance on large-scale problems up to a size of 175000 samples. To examine the virtual screening performance, we determined the chemotype clusters using Feature Trees and integrated this information to compute weighted AUC-based performance measures and a leave-cluster-out cross-validation. We also considered the BEDROC score, a metric that was suggested to tackle the early enrichment problem. The performance on each problem was evaluated by a nested cross-validation and a nested leave-cluster-out cross-validation. We compared LIBLINEAR against a Naïve Bayes classifier, a random decision forest classifier, and a maximum similarity ranking approach. These reference approaches were outperformed in a direct comparison by LIBLINEAR. A comparison to literature results showed that the LIBLINEAR performance is competitive but without achieving results as good as the top-ranked nonlinear machines on these benchmarks. However, considering the overall convincing performance and computation time of the large-scale support vector machine, the approach provides an excellent alternative to established large-scale classification approaches.
d
Developing Large-Scale Bayesian Networks by Composition
catalog.data.gov
s.cnmilf.com
+2more
Updated Sep 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Developing Large-Scale Bayesian Networks by Composition [Dataset]. https://catalog.data.gov/dataset/developing-large-scale-bayesian-networks-by-composition
Explore at:
Dataset updated
Sep 4, 2025
Dataset provided by
Dashlink
Description
In this paper, we investigate the use of Bayesian networks to construct large-scale diagnostic systems. In particular, we consider the development of large-scale Bayesian networks by composition. This compositional approach reflects how (often redundant) subsystems are architected to form systems such as electrical power systems. We develop high-level specifications, Bayesian networks, clique trees, and arithmetic circuits representing 24 different electrical power systems. The largest among these 24 Bayesian networks contains over 1,000 random variables. Another BN represents the real-world electrical power system ADAPT, which is representative of electrical power systems deployed in aerospace vehicles. In addition to demonstrating the scalability of the compositional approach, we briefly report on experimental results from the diagnostic competition DXC, where the ProADAPT team, using techniques discussed here, obtained the highest scores in both Tier 1 (among 9 international competitors) and Tier 2 (among 6 international competitors) of the industrial track. While we consider diagnosis of power systems specically, we believe this work is relevant to other system health management problems, in particular in dependable systems such as aircraft and spacecraft. Reference: O. J. Mengshoel, S. Poll, and T. Kurtoglu. "Developing Large-Scale Bayesian Networks by Composition: Fault Diagnosis of Electrical Power Systems in Aircraft and Spacecraft." Proc. of the IJCAI-09 Workshop on Self-* and Autonomous Systems (SAS): Reasoning and Integration Challenges, 2009 BibTex Reference: @inproceedings{mengshoel09developing, title = {Developing Large-Scale {Bayesian} Networks by Composition: Fault Diagnosis of Electrical Power Systems in Aircraft and Spacecraft}, author = {Mengshoel, O. J. and Poll, S. and Kurtoglu, T.}, booktitle = {Proc. of the IJCAI-09 Workshop on Self-$\star$ and Autonomous Systems (SAS): Reasoning and Integration Challenges}, year={2009} }
Z
Community Detection to Split Large-scale Assemblies in Subassemblies
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Aug 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Münker, Sören (2023). Community Detection to Split Large-scale Assemblies in Subassemblies [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8260584
Explore at:
Dataset updated
Aug 19, 2023
Dataset provided by
WZL of RWTH Aachen University
Authors
Münker, Sören
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The motivation for the preprocessing of large-scale CAD models for assembly-by-disassembly approaches. The assembly-by-disassembly is only suitable for assemblies with a small number of parts (n_{parts} < 22). However, when dealing with large-scale products with high complexity, the CAD models may not contain feasible subassemblies (e.g. with connected and interference-free parts) and have too many parts to be processed with assembly-by-disassembly. Product designers' preferences during the design phase might not be ideal for assembly-by-disassembly processing because they do not consider subassembly feasibility and the number of parts per subassembly concisely. An automated preprocessing approach is proposed to address this issue by splitting the model into manageable partitions using community detection. This will allow for parallelised, efficient and accurate assembly-by-disassembly of large-scale CAD models. However, applying community detection methods for automatically splitting CAD models into smaller subassemblies is a new concept and research on the suitability for ASP needs to be conducted. Therefore, the following underlying research question will be answered in this experiments:

Underlying research question 2: Can automated preprocessing increase the suitability of CAD-based assembly-by-disassembly for large-scale products?

A hypothesis is formulated to answer this research question, which will be utilised to design experiments for hypothesis testing.

Hypothesis 2: Community detection algorithms can be applied to automatically split large-scale assemblies in suitable candidates for CAD-based AND/OR graph generation.}
Making Predictions using Large Scale Gaussian Processes - Dataset - NASA...
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Making Predictions using Large Scale Gaussian Processes - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/making-predictions-using-large-scale-gaussian-processes
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
One of the key problems that arises in many areas is to estimate a potentially nonlinear function [tex] G(x, \theta)[/tex] given input and output samples tex [/tex] so that [tex]y approx G(x, \theta)[/tex]. There are many approaches to addressing this regression problem. Neural networks, regression trees, and many other methods have been developed to estimate [tex]$G$[/tex] given the input output pair tex [/tex]. One method that I have worked with is called Gaussian process regression. There many good texts and papers on the subject. For more technical information on the method and its applications see: http://www.gaussianprocess.org/ A key problem that arises in developing these models on very large data sets is that it ends up requiring an [tex]O(N^3)[/tex] computation where N is the number of data points and the training sample. Obviously this becomes very problematic when N is large. I discussed this problem with Leslie Foster, a mathematics professor at San Jose State University. He, along with some of his students, developed a method to address this problem based on Cholesky decomposition and pivoting. He also shows that this leads to a numerically stable result. If ou're interested in some light reading, I’d suggest you take a look at his recent paper (which was accepted in the Journal of Machine Learning Research) posted on dashlink. We've also posted code for you to try it out. Let us know how it goes. If you are interested in applications of this method in the area of prognostics, check out our new paper on the subject which was published in IEEE Transactions on Systems, Man, and Cybernetics.
d
The Convergence of High Performance Computing, Big Data, and Machine...
catalog.data.gov
s.cnmilf.com
Updated May 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NCO NITRD (2025). The Convergence of High Performance Computing, Big Data, and Machine Learning: Summary of the Big Data and High End Computing Interagency Working Groups Joint Workshop [Dataset]. https://catalog.data.gov/dataset/the-convergence-of-high-performance-computing-big-data-and-machine-learning-summary-of-the
Explore at:
Dataset updated
May 14, 2025
Dataset provided by
NCO NITRD
Description
The high performance computing (HPC) and big data (BD) communities traditionally have pursued independent trajectories in the world of computational science. HPC has been synonymous with modeling and simulation, and BD with ingesting and analyzing data from diverse sources, including from simulations. However, both communities are evolving in response to changing user needs and technological landscapes. Researchers are increasingly using machine learning (ML) not only for data analytics but also for modeling and simulation; science-based simulations are increasingly relying on embedded ML models not only to interpret results from massive data outputs but also to steer computations. Science-based models are being combined with data-driven models to represent complex systems and phenomena. There also is an increasing need for real-time data analytics, which requires large-scale computations to be performed closer to the data and data infrastructures, to adapt to HPC-like modes of operation. These new use cases create a vital need for HPC and BD systems to deal with simulations and data analytics in a more unified fashion. To explore this need, the NITRD Big Data and High-End Computing R&D Interagency Working Groups held a workshop, The Convergence of High-Performance Computing, Big Data, and Machine Learning, on October 29-30, 2018, in Bethesda, Maryland. The purposes of the workshop were to bring together representatives from the public, private, and academic sectors to share their knowledge and insights on integrating HPC, BD, and ML systems and approaches and to identify key research challenges and opportunities. The 58 workshop participants represented a balanced cross-section of stakeholders involved in or impacted by this area of research. Additional workshop information, including a webcast, is available at https://www.nitrd.gov/nitrdgroups/index.php?title=HPC-BD-Convergence.
Cloud-based Project Portfolio Management Market by End-user and Geography -...
technavio.com
pdf
Updated Jul 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2021). Cloud-based Project Portfolio Management Market by End-user and Geography - Forecast and Analysis 2021-2025 [Dataset]. https://www.technavio.com/report/cloud-based-project-portfolio-management-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Jul 27, 2021
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2021 - 2025
Description
Snapshot img

The cloud-based project portfolio management market share is expected to increase by USD 4.83 billion from 2020 to 2025, and the market’s growth momentum will accelerate at a CAGR of 18.26%.

This cloud-based project portfolio management market research report provides valuable insights on the post COVID-19 impact on the market, which will help companies evaluate their business approaches. Furthermore, this report extensively covers cloud-based project portfolio management market segmentations by end user (manufacturing, ICT, healthcare, BFSI, and others) and geography (North America, Europe, APAC, MEA, and South America). The cloud-based project portfolio management market report also offers information on several market vendors, including Atlassian Corp. Plc, Broadcom Inc., Mavenlink Inc., Micro Focus International Plc, Microsoft Corp., Oracle Corp., Planview Inc., SAP SE, ServiceNow Inc., and Upland Software, Inc. among others.

What will the Cloud-based Project Portfolio Management Market Size be During the Forecast Period?

Download the Free Report Sample to Unlock the Cloud-based Project Portfolio Management Market Size for the Forecast Period and Other Important Statistics

Cloud-based Project Portfolio Management Market: Key Drivers, Trends, and Challenges

The increasing requirements for large-scale project portfolio management is notably driving the cloud-based project portfolio management market growth, although factors such as challenges from open-source platforms may impede market growth. Our research analysts have studied the historical data and deduced the key market drivers and the COVID-19 pandemic impact on the cloud-based project portfolio management industry. The holistic analysis of the drivers will help in deducing end goals and refining marketing strategies to gain a competitive edge.

Key Cloud-based Project Portfolio Management Market Driver

The increasing requirements for large-scale project portfolio management is a major factor driving the global cloud-based project portfolio management market share growth. Currently, organizations are focusing on cultivating and managing the resources necessary for efficient product outputs, which increases the requirements for efficient solutions for large-scale project portfolio management. The primary purpose of the cloud-based project portfolio management software is to automate processes to ensure maximum outputs by managing resources and maintaining a regular follow-up. The main benefit of employing cloud-based project portfolio management software in large-scale project portfolio management is that automated services increase the connectivity so that organizations can handle the project-related inquiries easily and effectively. Also, automation decreases the response time and increases productivity, which ensures efficient process management. Additionally, by using cloud-based project portfolio management software, revenue possibilities can be rapidly increased by calculating conversion ratios and running reports to track the metrics detailed as per the customer demand. These features decrease the operating time. Due to such reasons, the demand for the market will grow significantly during the forecast period.

Key Cloud-based Project Portfolio Management Market Trend

The interlinking of software with project portfolio management is another factor supporting the global cloud- based project portfolio management market share growth. Since the demand for project portfolio management software is rising in the market, the stakeholders in several businesses are demanding new features in the software to increase their productivity. One of the main trends identified in the global cloud-based project portfolio management market is the interlinking of multiple software to match the requirements of the business. Currently, cloud-based project portfolio management software is deployed by several enterprises to give people access to documents, data, and reports from multiple devices at multiple locations. With all the data accessible centrally by numerous users, the accountability of the system will increase, which will provide enterprises with an instant overview of what everyone is working on. Additionally, interlinked project portfolio management software will enable the users to update data in real-time and will end the complication of sending endless email attachments of the same document. Moreover, the implementation of cloud-based project portfolio management will enhance the company's assurance for up-to-date data. Therefore, all such factor will contribute to the growth of the market.

Key Cloud-based Project Portfolio Management Market Challenge

The rising challenges from open-source platforms will be a major challenge for the global cloud-based project portfolio management market share growth during the forecast period. With the rising demand for digitalization in the current market s
4
Figure data for the dissertation "Solving Large-Scale Dynamic Collaborative...
data.4tu.nl
figshare.com
zip
Updated Oct 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johan Los (2021). Figure data for the dissertation "Solving Large-Scale Dynamic Collaborative Vehicle Routing Problems - An Auction-Based Multi-Agent Approach" [Dataset]. http://doi.org/10.4121/16638262.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/16638262.v1
Dataset updated
Oct 5, 2021
Dataset provided by
4TU.ResearchData
Authors
Johan Los
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This data set contains the coordinates of the plots in the thesis "Solving Large-Scale Dynamic Collaborative Vehicle Routing Problems - An Auction-Based Multi-Agent Approach" by Johan Los. It represents the results of various computational experiments in collaborative vehicle routing that were conducted to investigate to what extent an auction-based multi-agent system can be applied to solve dynamic large-scale collaborative vehicle routing problems. The data set indicates, among others, the value of information sharing, the profits that can be obtained by cooperation under different circumstances, and the individual profits that can be obtained when strategic bidding is applied.
A
Data from: Agile Big Data Analytics of High-Volume Geodetic Data Products...
data.amerigeoss.org
html
Updated Jul 19, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2018). Agile Big Data Analytics of High-Volume Geodetic Data Products for Improving Science and Hazard Response [Dataset]. https://data.amerigeoss.org/zh_TW/dataset/agile-big-data-analytics-of-high-volume-geodetic-data-products-for-improving-science-and-h
Explore at:
htmlAvailable download formats
Dataset updated
Jul 19, 2018
Dataset provided by
United States
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Geodetic imaging is revolutionizing geophysics, but the scope of discovery has been limited by labor-intensive technological implementation of the analyses. The Advanced Rapid Imaging and Analysis (ARIA) project has proven capability to automate SAR image analysis, having processed thousands of COSMO-SkyMed (CSK) scenes collected over California in the last year as part of a JPL/Caltech collaboration with the Italian Space Agency (ASI). The successful analysis of large volumes of SAR data has brought to the forefront the need for analytical tools for SAR quality assessment (QA) on large volumes of images, a critical step before higher level time series and velocity products can be reliably generated. While single interferograms are useful for imaging episodic events such earthquakes, in order to fully exploit the tsunami of SAR imagery that will be generated by current and future missions, we need to develop more agile and flexible methods for evaluating interferograms and coherence maps.

Our AIST-2011 Advanced Rapid Imaging & Analysis for Monitoring Hazards (ARIA-MH) data system has been providing data products to researchers working on a variety of earth science problems including glacial dynamics, tectonics, volcano dynamics, landslides and disaster response. A data system with agile analytics capability could reduce the amount of time researchers currently spend on analysis, quality assessment, and re-analysis of interferograms and time series analysis from months to hours. A key stage in analytics for SAR is the quality assessment stage, which is a necessary step before researchers can reliably use results for their interpretations and models, and we propose to develop machine learning tools to enable more automated quality assessment of complex imagery like interferograms, which will in turn enable greater science return by expanding the amount of data that can be applied to research problems.

Objectives: We will develop an advanced hybrid-cloud computing science data system for easily performing massive-scale analytics of geodetic data products for improving the quality of InSAR and GPS data products that are used for disasters monitoring and response. We will focus our analysis on Big Data-scale analytics that are needed to quickly and efficiently assess the quality of the increasing collections of geodetic data products being generated existing and future missions.

Technology Innovations: Science is an iterative process that requires repeated exploration of the data through various what-if scenarios. By enabling faster turn-around of analytics and analysis processing of the increasing amount of geodetic data, we will enable new science that cannot currently be done. We will adapt machine learning approach to QA assessment for improving the quality of geodetic data products. Furthermore, these types of analytics such as assessing coherence measures of the InSAR data will be used to improve the quality of the data products that are already being used for disasters response. We will develop new approaches enabling users to quickly develop, deploy, run, and analyze their own custom analysis code across entire InSAR and GPS collections.

Expected Significance: To improve the impact of our generated data products for both the science and monitoring user communities, quality assessment (QA) techniques and metrics are needed to automatically analyze the PB-scale data volumes to identify both problems and changes in the deformation and coherence time series. Automated QA techniques are currently underdeveloped within the InSAR analysis community, but have already become much more strategically important for supporting the expected high data volumes of upcoming missions such as Sentinel, ALOS-2, and NASA-ISRO SAR (NISAR) and high-quality science and applications. The science data system technology will also enable NASA to support the high data volume needs of NISAR in addition to the analysis of the data products.
Z
Dataset: A continuous open source data collection platform for architectural...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Jan 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darius Sas; Alessandro Gilardi; Ilaria Pigazzini; Francesca Arcelli Fontana (2024). Dataset: A continuous open source data collection platform for architectural technical debt assessment [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8435445
Explore at:
Dataset updated
Jan 1, 2024
Dataset provided by
University of Milano-Bicocca
Arcan SRL
Authors
Darius Sas; Alessandro Gilardi; Ilaria Pigazzini; Francesca Arcelli Fontana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset and replication package of the study "A continuous open source data collection platform for architectural technical debt assessment".

Abstract

Architectural decisions are the most important source of technical debt. In recent years, researchers spent an increasing amount of effort investigating this specific category of technical debt, with quantitative methods, and in particular static analysis, being the most common approach to investigate such a topic.

However, quantitative studies are susceptible, to varying degrees, to external validity threats, which hinder the generalisation of their findings.

In response to this concern, researchers strive to expand the scope of their study by incorporating a larger number of projects into their analyses. This practice is typically executed on a case-by-case basis, necessitating substantial data collection efforts that have to be repeated for each new study.

To address this issue, this paper presents our initial attempt at tackling this problem and enabling researchers to study architectural smells at large scale, a well-known indicator of architectural technical debt. Specifically, we introduce a novel approach to data collection pipeline that leverages Apache Airflow to continuously generate up-to-date, large-scale datasets using Arcan, a tool for architectural smells detection (or any other tool).

Finally, we present the publicly-available dataset resulting from the first three months of execution of the pipeline, that includes over 30,000 analysed commits and releases from over 10,000 open source GitHub projects written in 5 different programming languages and amounting to over a billion of lines of code analysed.
A novel scale-space approach for multinormality testing and the k-sample...
plos.figshare.com
zip
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kristian Hindberg; Jan Hannig; Fred Godtliebsen (2023). A novel scale-space approach for multinormality testing and the k-sample problem in the high dimension low sample size scenario [Dataset]. http://doi.org/10.1371/journal.pone.0211044
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0211044
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Kristian Hindberg; Jan Hannig; Fred Godtliebsen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Two classical multivariate statistical problems, testing of multivariate normality and the k-sample problem, are explored by a novel analysis on several resolutions simultaneously. The presented methods do not invert any estimated covariance matrix. Thereby, the methods work in the High Dimension Low Sample Size situation, i.e. when n ≤ p. The output, a significance map, is produced by doing a one-dimensional test for all possible resolution/position pairs. The significance map shows for which resolution/position pairs the null hypothesis is rejected. For the testing of multinormality, the Anderson-Darling test is utilized to detect potential departures from multinormality at different combinations of resolutions and positions. In the k-sample case, it is tested whether k data sets can be said to originate from the same unspecified discrete or continuous multivariate distribution. This is done by testing the k vectors corresponding to the same resolution/position pair of the k different data sets through the k-sample Anderson-Darling test. Successful demonstrations of the new methodology on artificial and real data sets are presented, and a feature selection scheme is demonstrated.
D
Big Data Analytics In Bfsi Market Report | Global Forecast From 2025 To 2033...
dataintelo.com
csv, pdf, pptx
Updated Oct 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Big Data Analytics In Bfsi Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/big-data-analytics-in-bfsi-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Oct 16, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Big Data Analytics In BFSI Market Outlook

The global market size for Big Data Analytics in the BFSI sector was valued at approximately USD 20 billion in 2023 and is expected to reach nearly USD 60 billion by 2032, growing at a robust CAGR of 12.5% during the forecast period. This significant growth can be attributed to the increasing adoption of advanced data analytics techniques in the banking, financial services, and insurance (BFSI) sector to enhance decision-making processes, optimize operations, and improve customer experiences.

One of the primary growth factors for the Big Data Analytics market in the BFSI sector is the growing need for risk management and fraud detection. Financial institutions are increasingly harnessing big data analytics to detect anomalies and patterns that could indicate fraudulent activities, thereby protecting themselves and their customers from significant financial losses. With cyber threats becoming more sophisticated, the demand for advanced analytics solutions that can provide real-time insights and predictive analytics is on the rise.

Another critical driver of market growth is the increasing regulatory requirements and compliance standards that financial institutions must adhere to. Governments and regulatory bodies worldwide are imposing stricter regulations to ensure the stability and security of financial systems. Big data analytics solutions help organizations ensure compliance with these regulations by providing comprehensive data analysis and reporting capabilities, which can identify potential compliance issues before they become critical problems.

Customer analytics is also a significant growth factor, as financial institutions strive to understand their customers better and offer personalized services. By leveraging big data analytics, banks and insurers can analyze customer behavior, preferences, and transaction history to develop tailored products and services, thereby enhancing customer satisfaction and loyalty. This customer-centric approach not only helps in retaining existing customers but also attracts new ones, further driving market growth.

Regionally, North America holds the largest market share due to the early adoption of advanced technologies and the presence of major financial institutions that are keen on investing in big data analytics solutions. The region's strong technological infrastructure and supportive regulatory environment also contribute to market growth. Asia Pacific is expected to witness the highest growth rate during the forecast period, driven by the rapid digital transformation in emerging economies such as China and India, and increasing investments in big data analytics by regional BFSI players.

Component Analysis

The Big Data Analytics market in the BFSI sector can be segmented by components into software and services. The software segment encompasses various analytics tools and platforms that enable financial institutions to collect, process, and analyze large volumes of data. This segment is expected to witness substantial growth owing to the increasing demand for sophisticated analytics software that can handle the complexity and scale of financial data.

Within the software segment, solutions for data visualization, predictive analytics, and machine learning are gaining significant traction. These technologies empower organizations to uncover hidden patterns, predict future trends, and make data-driven decisions. For instance, predictive analytics can help banks forecast credit risk and optimize loan portfolios, while machine learning algorithms can enhance fraud detection systems by identifying unusual transaction patterns.

The services segment includes consulting, implementation, and maintenance services offered by vendors to help BFSI institutions effectively deploy and manage big data analytics solutions. As the adoption of big data analytics grows, the demand for professional services to support the implementation and ongoing management of these solutions is also expected to rise. Consulting services are particularly important as they enable financial institutions to develop tailored analytics strategies that align with their specific business goals and regulatory requirements.

Furthermore, managed services are becoming increasingly popular, as they allow organizations to outsource the management of their analytics infrastructure to specialized vendors. This not only reduces the burden on internal IT teams but also ensures that the analytics systems are maintained and updated regularly to
m
Data for article -- Similarity coefficient-based cell formation method...
data.mendeley.com
figshare.com
Updated May 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lang Wu (2020). Data for article -- Similarity coefficient-based cell formation method considering operation sequence with repeated operations [Dataset]. http://doi.org/10.17632/24v5zs5dz9.1
Explore at:
Unique identifier
https://doi.org/10.17632/24v5zs5dz9.1
Dataset updated
May 9, 2020
Authors
Lang Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
7 problems from previous literature are employed to test the performance of the proposed approach and show its advantages by comparing the results with existing approaches. The problems are arranged from small to large scale. Their incidence matrices, data sets and solutions can be found at the dataset.
BugCatcher-Data
zenodo.org
bin
Updated Dec 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaopeng XU; Shaopeng XU (2024). BugCatcher-Data [Dataset]. http://doi.org/10.5281/zenodo.14536733
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14536733
Dataset updated
Dec 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shaopeng XU; Shaopeng XU
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Bug datasets play a vital role in advancing software engineering tasks, including bug detection, fault localization, and automated program repair. These datasets enable the development of more accurate algorithms, facilitate efficient fault identification, and drive the creation of reliable automated repair tools. However, the manual collection and curation of such data are labor-intensive and prone to inconsistency, which limits scalability and reliability. Current datasets often fail to provide detailed and accurate information, particularly regarding bug types, descriptions, and classifications, reducing their utility in diverse research and practical applications. To address these challenges, we introduce BugCatcher, a comprehensive approach for constructing large-scale, high-quality bug datasets. BugCatcher begins by enhancing PR-Issue linking mechanisms, extending data collection to 12 programming languages over a decade, and ensuring accurate linkage between pull requests and issues. It employs a two-stage filtering process, BugCurator, to refine data quality, and utilizes large language models with Zero-shot Chain-of-Thought prompting to generate precise bug types and detailed descriptions. Furthermore, BugCatcher incorporates a robust classification framework, fine-tuning models for improved categorization. The resulting dataset, BugCatcher-Data, includes 243,265 bug-fix entries with comprehensive fields such as code diffs, bug locations, detailed descriptions, and classifications, serving as a substantial resource for advancing software engineering research and practices.
H
Data from: The Laws of Anomaly: A Framework for Regression Model Selection...
dataverse.harvard.edu
Updated Oct 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Priyanuj Boruah (2025). The Laws of Anomaly: A Framework for Regression Model Selection Based on a Large Scale Empirical Study of Structural Data Challenges [Dataset]. http://doi.org/10.7910/DVN/VH9JJA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/VH9JJA
Dataset updated
Oct 22, 2025
Dataset provided by
Harvard Dataverse
Authors
Priyanuj Boruah
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
The "ensemble-first" strategy, while a popular heuristic for tabular regression, lacks a formal framework and fails on specific data challenges. This thesis introduces the Efficiency-Based Model Selection Framework (EMSF), a new methodology that aligns model architecture with a dataset's primary structural challenge. We benchmarked over 20 models across 100 real-world datasets, categorized into four novel cohorts: high row-to-size (computational efficiency), wide data (parameter efficiency), and messy data (data efficiency). This large-scale empirical study establishes three fundamental laws of applied regression. The Law of Ensemble Dominance confirms that ensembles are the most efficient choice in over 70% of standard cases. The Law of Anomaly Supremacy proves the critical exceptions: we provide the first large-scale evidence that K-Nearest Neighbors (KNN) excels on high-dimensional data, and that robust models like the Huber Regressor are "silver bullet" solutions for datasets with hidden outliers, winning with performance margins exceeding 1500%. Finally, the Law of Predictive Futility reframes benchmarking as a diagnostic tool for identifying datasets that lack predictive signal. The EMSF provides a practical, evidence-based playbook for practitioners to move beyond a one-size-fits-all approach to model selection.
H
Data from: Open Source Scientific Cloud Pipelines to Support Multiscale...
beta.hydroshare.org
hydroshare.org
+1more
zip
Updated Dec 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony M. Castronova; Irene Garousi-Nejad; Scott Black; Neal DeBuhr (2023). Open Source Scientific Cloud Pipelines to Support Multiscale Hydrologic Modeling Studies [Dataset]. http://doi.org/10.4211/hs.ad6bdeb1a7f5400193578933cc8a6eae
Explore at:
zip(10.0 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.ad6bdeb1a7f5400193578933cc8a6eae
Dataset updated
Dec 12, 2023
Dataset provided by
HydroShare
Authors
Anthony M. Castronova; Irene Garousi-Nejad; Scott Black; Neal DeBuhr
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 12, 2023
Description
Computational hydrology and real world decision-making increasingly rely on simulation-based, multi-scenario analyses. Enabling scientists to align their research with national-scale efforts is necessary to facilitate knowledge transfer and sharing between operational applications and those focused on local or regional water issues. Leveraging existing large-domain datasets with new and innovative modeling practices is vital for improving operational prediction systems. The scale of these large-domain datasets presents significant challenges when applying them at smaller spatial scales, specifically data collection, pre-processing, post-processing, and reproducibly disseminating findings. Given these challenges, we propose a cloud-based data processing and modeling pipeline, leveraging existing open source tools and cloud technologies, to support common hydrologic data analysis and modeling procedures. Through this work we establish a scalable and flexible pattern for enabling efficient data processing and modeling in the cloud using workflows containing both publicly accessible and privately maintained cloud stores. By leveraging modern cloud computing technologies such as Kubernetes, Dask, Argo, and Analysis Ready Cloud Optimized data, we establish a computationally scalable solution that can be deployed for specific scientific studies, research projects, or communities. We present an approach for using large-domain meteorological and hydrologic modeling datasets for local and regional applications using the NOAA National Water Model, the NOAA NextGen Hydrological Modeling Framework, and Parflow. We discuss how this approach can be used to advance our collective understanding of hydrologic processes, creating reusable workflows, and operating on large-scale data in the cloud.
Large-scale Forest Stand Height mapping for the northeast of U.S. and China...
zenodo.org
data.niaid.nih.gov
jpeg, tiff
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yanghai Yu; Yanghai Yu; Yang Lei; Yang Lei (2024). Large-scale Forest Stand Height mapping for the northeast of U.S. and China using L-band spaceborne repeat-pass InSAR and GEDI [Dataset]. http://doi.org/10.5281/zenodo.11640299
Explore at:
tiff, jpegAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11640299
Dataset updated
Aug 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yanghai Yu; Yanghai Yu; Yang Lei; Yang Lei
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 8, 2024
Area covered
Northeastern United States, United States, China
Description
The forest height mosaic for the northeastern parts of China and U.S are generated based on a global-to-local inversion approach proposed in (Yu et al., 2023) by making use of Spaceborne repeat-pass InSAR and spaceborne GEDI data. The sparsely but extensively distributed LiDAR samples provided by NASA’s GEDI mission are used to parametrize the semi-empirical repeat-pass InSAR scattering model(Lei et al., 2017) and to obtain forest height estimates. Compared to our previous efforts (Lei et al., 2018, Lei and Siqueira, 2022), this paper further removes the assumptions that were made given the limited availability of calibration samples at that time, and developed a new inversion approach based on a global-to-local two-stage inversion scheme. This approach allows a better use of local GEDI samples to achieve finer characterization of temporal decorrelation pattern and thus higher accuracy of forest height inversion. This approach is further fully automated to enable a large-scale forest mapping capability. Two forest height mosaic maps were generated for the entire northeastern regions of U.S. and China with total area of 18 million hectares and 112 million hectares, respectively. The validation of the forest height estimates demonstrates much improved accuracy achieved by the proposed approach compared to the previous efforts i.e., reducing from RMSE of 3-4 m on the order of 3-6-hectare aggregated pixel size to RMSE 3-4 m on the order of 0.81-hectare pixel size. The proposed fusion approach not only addresses the sparse spatial sampling problem inherent to the GEDI mission, but also improve the accuracy of forest height estimates compared to the GEDI-interpolated maps by a factor of 20% at 30-m resolution. The extensive evaluation of forest height inversion against LVIS LiDAR data indicates an accuracy 3-4 m on the order of 0.81 hectare over smooth areas and 4-5 m over hilly areas in U.S., whereas the forest height estimates over northeastern China are best compared with small footprint LiDAR validation data even at an accuracy of even below 3.5 m with R2 mostly above 0.6. Such a forest height inversion accuracy at sub-hectare pixel size provides promising values towards the existing and future spaceborne LiDAR (JAXA’s MOLI, NASA’s GEDI, China’s TECIS) and InSAR missions (NASA-ISRO’s NISAR, JAXA’s ALOS-4 and China’s LuTan-1). This fusion prototype can work as a cost-effective solution for public users to obtain a wall-to-wall forest height mapping at large scale when only spaceborne repeat-pass InSAR data are available and freely accessible.
n
Data from: Advanced Computational Methods for Large-Scale Optimization...
curate.nd.edu
Updated May 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhihao Xu (2025). Advanced Computational Methods for Large-Scale Optimization Problems [Dataset]. http://doi.org/10.7274/28786112.v1
Explore at:
Unique identifier
https://doi.org/10.7274/28786112.v1
Dataset updated
May 12, 2025
Dataset provided by
University of Notre Dame
Authors
Zhihao Xu
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
With the development of science and technology, large-scale optimization tasks have become integral to cutting-edge engineering. The challenges of solving these problems arises from ever-growing system sizes, intricate physical space, and the computational cost required to accurately model and optimize target objectives. Taking the design of advanced functional materials as an example, the high-dimensional parameter space and high-fidelity physical simulations can demand immense computational resources for searching and iterations. Although emerging machine learning techniques have been combined with conventional experimental and simulation approaches to explore the design space and identify high-performance solutions, these methods are still limited to a small part of the design space around those materials have been well investigated.

Over the past several decades, continuous development of both hardware and algorithms have addressed some of the challenges. High-performance computing (HPC) architectures and heterogeneous systems have greatly expanded the capacity to perform large-scale calculations and optimizations; On the other hand, the emergence of machine learning frameworks and algorithms have dramatically facilitated the development of advanced models and enable the integration of AI-driven techniques into traditional experiments and simulations more seamlessly. In recent years, quantum computing (QC) has received widespread attention due to its powerful performance on solving global optima and is regarded as a promising solution to large-scale and non-linear optimization problems in the future, and in the meantime, the quantum computing principles also expand the capacity of classical algorithms on exploring high-dimensional combinatorial spaces. In this dissertation, we will show the power of the integration of machine learning algorithms, quantum algorithms and HPC architectures on tackling the challenges of solving large-scale optimization problems.

In the first part of this dissertation, we introduced an optimization algorithm based on a Quantum-inspired Genetic Algorithm (QGA) to design planar multilayer (PML) for transparent radiative cooler (TRC) applications. Results of numerical experiments showed that our QGA-facilitated optimization algorithm can converge to comparable solutions as quantum annealing (QA) and the QGA overperformed on classical genetic algorithm (CGA) on both convergence speed and global search capacity. Our work shows that quantum heuristic algorithms will become powerful tools for addressing the challenges traditional optimization algorithm faced when solving large-scale optimization problems with complex search space.

In the second part of the dissertation, we proposed a quantum annealing-assisted lattice optimization (QALO) algorithm for high-entropy alloy (HEA) systems. The algorithm is developed based on the active learning framework that integrates the field-aware factorization machine (FFM), quantum annealing (QA) and machine learning potential (MLP). When applying to optimize the bulk grain configuration of the NbMoTaW alloy system, our algorithm can quickly obtain low-energy microstructures and the results successfully reproduce the Nb segregation and W enrichment in the bulk phase driven by thermodynamic driving force, which usually be observed in the experiments and MC/MD simulations. This work highlights the potential of quantum computing in exploring the large design space for HEA systems.

In the third part of the dissertation, we employed the Distributed Quantum Approximate Optimization Algorithm (DQAOA) to address large-scale combinatorial optimization problems that exceed the limits of conventional computational resources. This was achieved through a divide-and-conquer strategy, in which the original problem is decomposed into smaller sub-tasks that are solved in parallel on a high-performance computing (HPC) system. To further enhance convergence efficiency, we introduced an Impact Factor Directed (IFD) decomposition method. By calculating impact factors and leveraging a targeted traversal strategy, IFD captures local structural features of the problem, making it effective for both dense and sparse instances. Finally, we explored the integration of DQAOA with the Quantum Framework (QFw) on the Frontier HPC system, demonstrating the potential for efficient management of large-scale circuit execution workloads across CPUs and GPUs.
Data from: Gollum: A Gold Standard for Large Scale\\Multi Source Knowledge...
zenodo.org
csv, txt, xml
Updated Apr 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sven Herting; Sven Herting; Heiko Paulheim; Heiko Paulheim (2023). Gollum: A Gold Standard for Large Scale\\Multi Source Knowledge Graph Matching [Dataset]. http://doi.org/10.5281/zenodo.7817022
Explore at:
txt, xml, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7817022
Dataset updated
Apr 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sven Herting; Sven Herting; Heiko Paulheim; Heiko Paulheim
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
The set of Knowledge Graphs (KGs) generated with automatic and manual approaches is constantly growing.
For an integrated view and usage, an alignment between these KGs is necessary on the schema as well as instance level.
There are already approaches which try to tackle this multi source knowledge graph matching problem,
but large gold standards are missing to evaluate their effectiveness and scalability.
In particular, most existing gold standards are fairly small and can be solved by matchers which match exactly two KGs (1:1), which are the majority of existing matching systems.

We close this gap by presenting Gollum -- a gold standard for large-scale multi source knowledge graph matching with over 275,000 correspondences between 4,149 different KGs.
They originate from knowledge graphs derived by applying the DBpedia extraction framework to a large wiki farm.

Three variations of the gold standard are made available:
(1) a version with all correspondences for evaluating unsupervised matching approaches, and two versions for evaluating supervised matching: (2) one where each KG is contained both in the train and test set, and (3) one where each KG is exclusively contained in the train or the test set.

We plan to extend our KG track at the Ontology Alignment Evaluation Initiative (OAEI) to allow for matching systems
which are specifically designed to solve the multi KG matching problem.
As a first step towards this direction, we evaluate multi source matching approaches which reuse two-KG (1:1) matchers from the past OAEI.

Due to the size of the KG files, they are hosted at the institute:

http://data.dws.informatik.uni-mannheim.de/dbkwik/gollum/40K.tar (50,3 GB)
http://data.dws.informatik.uni-mannheim.de/dbkwik/gollum/all.tar (74,7 GB)
http://data.dws.informatik.uni-mannheim.de/dbkwik/gollum/gold.tar (25,3 GB)

Facebook

Twitter

Click to copy link

Link copied

Cite

Dashlink (2025). Making Predictions using Large Scale Gaussian Processes [Dataset]. https://catalog.data.gov/dataset/making-predictions-using-large-scale-gaussian-processes

Making Predictions using Large Scale Gaussian Processes

Explore at:

Dataset updated

Aug 22, 2025

Dataset provided by

Dashlink

Description

One of the key problems that arises in many areas is to estimate a potentially nonlinear function [tex] G(x, \theta)[/tex] given input and output samples tex [/tex] so that [tex]y approx G(x, \theta)[/tex]. There are many approaches to addressing this regression problem. Neural networks, regression trees, and many other methods have been developed to estimate [tex]$G$[/tex] given the input output pair tex [/tex]. One method that I have worked with is called Gaussian process regression. There many good texts and papers on the subject. For more technical information on the method and its applications see: http://www.gaussianprocess.org/ A key problem that arises in developing these models on very large data sets is that it ends up requiring an [tex]O(N^3)[/tex] computation where N is the number of data points and the training sample. Obviously this becomes very problematic when N is large. I discussed this problem with Leslie Foster, a mathematics professor at San Jose State University. He, along with some of his students, developed a method to address this problem based on Cholesky decomposition and pivoting. He also shows that this leads to a numerically stable result. If ou're interested in some light reading, I’d suggest you take a look at his recent paper (which was accepted in the Journal of Machine Learning Research) posted on dashlink. We've also posted code for you to try it out. Let us know how it goes. If you are interested in applications of this method in the area of prognostics, check out our new paper on the subject which was published in IEEE Transactions on Systems, Man, and Cybernetics.

Clear search

Close search

Google apps

Main menu

Making Predictions using Large Scale Gaussian Processes

Data from: A Review of International Large-Scale Assessments in Education...

Data from: Large-Scale Learning of Structure−Activity Relationships Using a...

Developing Large-Scale Bayesian Networks by Composition

Community Detection to Split Large-scale Assemblies in Subassemblies

Making Predictions using Large Scale Gaussian Processes - Dataset - NASA...

The Convergence of High Performance Computing, Big Data, and Machine...

Cloud-based Project Portfolio Management Market by End-user and Geography -...

Snapshot img

Figure data for the dissertation "Solving Large-Scale Dynamic Collaborative...

Data from: Agile Big Data Analytics of High-Volume Geodetic Data Products...

Dataset: A continuous open source data collection platform for architectural...

A novel scale-space approach for multinormality testing and the k-sample...

Big Data Analytics In Bfsi Market Report | Global Forecast From 2025 To 2033...

Big Data Analytics In BFSI Market Outlook

Component Analysis

Data for article -- Similarity coefficient-based cell formation method...

BugCatcher-Data

Data from: The Laws of Anomaly: A Framework for Regression Model Selection...

Data from: Open Source Scientific Cloud Pipelines to Support Multiscale...

Large-scale Forest Stand Height mapping for the northeast of U.S. and China...

Data from: Advanced Computational Methods for Large-Scale Optimization...

Data from: Gollum: A Gold Standard for Large Scale\\Multi Source Knowledge...

Making Predictions using Large Scale Gaussian ProcessesSee More Versions

Making Predictions using Large Scale Gaussian Processes