Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 7 rows and is filtered where the books is Assessing and improving prediction and classification : theory and algorithms in C++. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
These are the algorithms needed in the paper "Finite orbits of the pure braid group on the monodromy of the 2-variable Garnier system" by P. Calligaris and M. Mazzocco.More information on using these algorithms can be found in the file "README.pdf"An updated version of the algorithms was uploaded on 8th September 2017.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The codes and datasets of the SVR-FCM algorithm.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In this study, we introduce the count-based Morgan fingerprint (C-MF) to represent chemical structures of contaminants and develop machine learning (ML)-based predictive models for their activities and properties. Compared with the binary Morgan fingerprint (B-MF), C-MF not only qualifies the presence or absence of an atom group but also quantifies its counts in a molecule. We employ six different ML algorithms (ridge regression, SVM, KNN, RF, XGBoost, and CatBoost) to develop models on 10 contaminant-related data sets based on C-MF and B-MF to compare them in terms of the model’s predictive performance, interpretation, and applicability domain (AD). Our results show that C-MF outperforms B-MF in nine of 10 data sets in terms of model predictive performance. The advantage of C-MF over B-MF is dependent on the ML algorithm, and the performance enhancements are proportional to the difference in the chemical diversity of data sets calculated by B-MF and C-MF. Model interpretation results show that the C-MF-based model can elucidate the effect of atom group counts on the target and have a wider range of SHAP values. AD analysis shows that C-MF-based models have an AD similar to that of B-MF-based ones. Finally, we developed a “ContaminaNET” platform to deploy these C-MF-based models for free use.
PHM08 Challenge Dataset is now publicly available at the NASA Prognostics Respository + Download INTRODUCTION - WHY SIMULATE DEGRADATION DATA? Of various challenges encountered in prognostics algorithm development, the non-availability of suitable validation data is most often the bottleneck in the technology certification process. Prognostics imposes several requirements on the training data in addition to what is commonly available from various applications. It not only requires data containing fault signatures but also that contains fault evolution trends with corresponding time indexes (in number of hours or number of operational cycles). In general there are three sources from which data is usually available, namely: Fielded applications, experimental test-beds, and computer simulations (see Figure 1). From prognostics point of view, data collection paradoxically suffers from the situation that the systems that do run to failure often did not have warning instrumentation installed, hence no or little record of what went wrong. In the other situation, those that are continuously monitored are prevented from running to failure or are subject to maintenance that eliminates the signatures of fault evolution. Conducting experiments that replicate real world situations is extremely expensive in terms of time required for a healthy system to run to failure and is often dangerous. Accelerated ageing may be useful to some extent but may not emulate normal wear patterns. Furthermore, to manage uncertainty multiple datasets must be collected to quantify variations resulting from multiple sources, which makes it all the way more unattainable. Simulations can be fast, inexpensive, and provide a number of options to design experiments, but their usefulness is contingent on the availability of high fidelity models that represent the real systems fairly well. However, once such a model is available, simulations offer the flexibility to rerun various experiments with added knowledge from the system as it becomes available. Where, availability of real fault evolution data from the fielded systems would be more desirable, generating data using a high fidelity model and integrating it with the knowledge gathered from the partial data obtained from the real systems is by far the most practical approach for prognostics algorithm development, validation, and verification. In this presentation we discuss some key elements that must be kept in mind while generating datasets suitable for prognostics. Furthermore, with the help of an example it has been shown how a dynamical system model can be supported with suitable degradation models available from respective domain knowledge to create suitable data. The example is discussed next. APPLICATION DOMAIN Tracking and Predicting the progressionof damage in aircraft turbo machinery has been an active area of study within the Condition Based Maintenance (CBM) community. A general approach has been to correlate flow and effciency losses to degradation signtures in various components of the engine. Once such mapping is available, the next task is to estimate this loss of flow and eficiency inferring information from measurable sensor outputs, which ultimtely is used to assess the level of degradation in the system. SYSTEM MODEL: C-MAPSS The C-MAPSS (Commercial Modular Aero Propulsion System Simulation) is a tool, recently released, for simulating a realistic large commercial turbofan engine. C-MAPSS (Commercial Modular Aero-Propulsion System Simulation) that simulates a realistic large (~90,000lb) commercial turbofan engine. It allows the user to choose and design operational profiles, controllers, environmental conditions, thrust levels, etc. to simualte a scenario of interest. An extensive list of output va
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The dataset comprises of the C++ GPU implementation of the Boolean matrix factorization algorithm C-Salt. The ReadMe file which is included describes how the implementation can be set up and used. Two of the files, the IJulia notebooks with filenames CSaltEvalRealWorld.ipynb and CSaltGenerateSynthData.ipynb can be used to generate synthetic data as proposed in the paper, and to evaluate the quality measurements for results on the submitted text data.The data includes two IPython (IPYNB) notebook files, 6 tab separated value (.tsv) files, two .hpp files and two .cu files. The IPYNB files can be exported to .HTML, .PDF, reStructuredText, and LaTeX formats. .TSV files can be opened using open source text editors. .HPP is a header format used by C++, and .cu are associated with the NVIDIA CUDA Toolkit.Abstract:Given labelled data represented by a binary matrix, we consider the task to derive a Boolean matrix factorization which identifies commonalities and specifications among the classes. While existing works focus on rank-one factorizations which are either specific or common to the classes, we derive class-specific alterations from common factorizations as well. Therewith, we broaden the applicability of our new method to datasets whose class-dependencies have a more complex structure. On the basis of synthetic and real-world datasets, we show on the one hand that our method is able to alter structure which corresponds to our model assumption, and on the other hand that our model assumption is justified in real-world application. Our method is parameter-free.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Many real-world systems can be modeled by multistate flow networks (MFNs) and their reliability evaluation features in designing and control of these systems. Considering the cost constraint makes the problem of reliability evaluation of an MFN more realistic. For a given demand value d and a given cost limit c, the reliability of an MFN at level (d, c) is the probability of transmitting at least d units from the source node to the sink node through the network within the cost of c. This article addresses the so-called (d, c)-MC problem, i.e., the problem of reliability evaluation of an MFN with cost constraint in terms of minimal cuts. It presents new results on which a new algorithm is based. This algorithm finds all (d, c)-MC candidates without duplicates and verifies them more efficiently than existing ones. The complexity results for this algorithm and an example of its use are provided. Finally, numerical experiments with R language implementations of the presented algorithm and other competitive algorithms are considered. Both, the time complexity analysis and numerical experiments demonstrate the presented algorithm to be more efficient than other existing ones in the majority of cases.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains complementary data to the paper "The Least Cost Directed Perfect Awareness Problem: Complexity, Algorithms and Computations" [1]. Here, we make available two sets of instances of the combinatorial optimization problem studied in that paper, which deals with the spread of information on social networks. We also provide the best known solutions and bounds obtained through computational experiments for each instance.
The first input set includes 300 synthetic instances composed of graphs that resemble real-world social networks. These graphs were produced with a generator proposed in [2]. The second set consists of 14 instances built from graphs obtained by crawling Twitter [3].
The directories "synthetic_instances" and "twitter_instances" contain files that describe both sets of instances, all of which follow the format: the first two lines correspond to:
where
where
where and
The directories "solutions_for_synthetic_instances" and "solutions_for_twitter_instances" contain files that describe the best known solutions for both sets of instances, all of which follow the format: the first line corresponds to:
where is the number of vertices in the solution. Each of the next lines contains:
where
where
Lastly, two files, namely, "bounds_for_synthetic_instances.csv" and "bounds_for_twitter_instances.csv", enumerate the values of the best known lower and upper bounds for both sets of instances.
This work was supported by grants from Santander Bank, Brazil, Brazilian National Council for Scientific and Technological Development (CNPq), Brazil, São Paulo Research Foundation (FAPESP), Brazil.
Caveat: the opinions, hypotheses and conclusions or recommendations expressed in this material are the responsibility of the authors and do not necessarily reflect the views of Santander, CNPq, or FAPESP.
References
[1] F. C. Pereira, P. J. de Rezende. The Least Cost Directed Perfect Awareness Problem: Complexity, Algorithms and Computations. Submitted. 2023.
[2] B. Bollobás, C. Borgs, J. Chayes, and O. Riordan. Directed scale-free graphs. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’03, pages 132–139, 2003.
[3] C. Schweimer, C. Gfrerer, F. Lugstein, D. Pape, J. A. Velimsky, R. Elsässer, and B. C. Geiger. Generating simple directed social network graphs for information spreading. In Proceedings of the ACM Web Conference 2022, WWW ’22, pages 1475–1485, 2022.
Abstract C++QED is a versatile framework for simulating open quantum dynamics. It allows to build arbitrarily complex quantum systems from elementary free subsystems and interactions, and simulate their time evolution with the available time-evolution drivers. Through this framework, we introduce a design which should be generic for high-level representations of composite quantum systems. It relies heavily on the object-oriented and generic programming paradigms on one hand, and on the other hand, com...
Title of program: C++QED Catalogue Id: AELU_v1_0
Nature of problem Definition of (open) composite quantum systems out of elementary building blocks [1]. Manipulation of such systems, with emphasis on dynamical simulations such as Master-equation evolution [2] and Monte Carlo wave-function simulation [3].
Versions of this program held in the CPC repository in Mendeley Data AELU_v1_0; C++QED; 10.1016/j.cpc.2012.02.004 AELU_v2_0; C++QED; 10.1016/j.cpc.2014.04.011
This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)
Studies applying Community Level ModelsSummary table of empirical studies applying Community Level Models (CLMs). The studies have been grouped by CLM algorithm.CLMs_review-Dryad-Data.xlsxCommunity Level Modeling vignetteRmarkdown file with code for the Community Level Modeling tutorial: Fitting multispecies models in RCLMs_review-Dryad-Code.Rmd
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a collection of scholarly articles focused on the implementation of active learning techniques in data structures courses, with a particular emphasis on Java programming and its application in enhancing student learning in STEM (Science, Technology, Engineering, and Mathematics) disciplines. This collection provides a comprehensive view of various teaching strategies that promote deeper and more meaningful learning through active methods. Each included article has been selected for its relevance, accessibility (Open Access), and contribution to educational practice in programming and data structures.
Keywords: Active learning, data structures, Java programming, STEM, education, teaching strategies, student engagement.
This dataset provides a solid foundation for research and implementation of active learning techniques in data structures and programming courses, benefiting educators and students in the STEM field.
Dataset Contents:
Learning more about active learning Author: Graeme Stemp-Morlock DOI: 10.1145/1498765.1498771 Publication Date: April 1, 2009 Abstract: Discusses how active learning algorithms can reduce label complexity compared to passive methods.
A Compendium of Rationales and Techniques for Active Learning Author: C. Reiness DOI: 10.1187/CBE.20-08-0177 Publication Date: October 1, 2020 Abstract: Provides a collection of strategies for promoting active learning.
Defining Active Learning: A Restricted Systemic Review Authors: Peter Doolittle, Krista Wojdak, Amanda Walters DOI: 10.20343/teachlearninqu.11.25 Publication Date: September 22, 2023 Abstract: Defines active learning as a student-centered approach to knowledge construction focusing on higher-order thinking.
The Curious Construct of Active Learning Authors: D. Lombardi, T. Shipley DOI: 10.1177/1529100620973974 Publication Date: April 1, 2021 Abstract: Discusses the different interpretations of active learning in STEM domains.
Active Learning to Classify Macromolecular Structures in situ for Less Supervision in Cryo-Electron Tomography Authors: Xuefeng Du, Haohan Wang, Zhenxi Zhu, Xiangrui Zeng, Yi-Wei Chang, Jing Zhang, E. Xing, Min Xu DOI: 10.1093/bioinformatics/btab123 Publication Date: February 23, 2021 Abstract: Proposes a hybrid active learning framework to reduce labeling burden in cryo-ET tasks.
LSPC is the Loading Simulation Program in C++, a watershed modeling system that includes streamlined Hydrologic Simulation Program Fortran (HSPF) algorithms for simulating hydrology, sediment, and general water quality
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These files contain all of the necessary code to re-create the results in the manuscript entitled "Novel, provable algorithms for efficient ensemble-based computational protein design and their application to the redesign of the c-Raf-RBD:KRas protein-protein interface" along with all of the results described therein.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Geodemographic analysis involves clustering geographic areas into socio-demographically homogeneous groups. However, most existing methods prioritize overall effectiveness, measured by minimizing total costs, potentially misrepresenting specific subgroups within the data. Despite a growing literature on fair clustering, it focuses almost exclusively on crisp clustering, failing to address the inherent fuzziness of the real world. This study addresses these gaps by introducing a socially-fair geodemographic clustering (SFGC) framework, which modifies the classical fuzzy-c means (FCM) by incorporating a new cost function that, instead of minimizing total costs, minimizes the maximum average cost across all subgroups. SFGC also introduces a gradient descent-based algorithm to optimize this new cost function. In addition, SFGC can be directly adapted to crisp clustering, facilitating practical implementation and comparison of clustering algorithms.
We were interested here in particular in conditions where un-modeled effects are present as manifested by the different degradation curve at 45°C. Although all algorithms were given the same amount of information to the degree practical, there were considerable differences in performance. Specifically, the combined Bayesian regression-estimation approach implemented as a RVM-PF framework has significant advantages over conventional methods of RUL estimation like ARIMA and EKF. ARIMA, being a purely data-driven method, does not incorporate any physics of the process into the computation, and hence ends up with wide uncertainty margins that make it unsuitable for long-term predictions. Additionally, it may not be possible to eliminate all non-stationarity from a dataset even after repeated differencing, thus adding to prediction inaccuracy. EKF, though robust against non-stationarity, suffers from the inability to accommodate un-modeled effects and can diverge quickly as shown. We did not explore other variations of the Kalman Filter that might provide better performance such as the unscented Kalman Filter. The Bayesian statistical approach, on the other hand, appears to be well suited to handle various sources of uncertainties since it defines probability distributions over both parameters and variables and integrates out the nuisance terms. Also, it does not simply provide a mean estimate of the time-to-failure; rather it generates a probability distribution over time that best encapsulates the uncertainties inherent in the system model and measurements and in the core concept of failure prediction.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Middle confidence interval!
In this repository, we share all of the Human Connectome Project results maps used in the manuscript Spatial
Confidence Sets for Standardized Effect Size Images
(Bowring, Telschow, Schwartzman, Nichols; 2020).
Images are named using the following format: The 'Algorithm 1 LowerConfidenceInterval c05' image is the (blue) lower CS map obtained for the targetted Cohen's d effect size c = 0.5 using Algorithm. 1 as described in the manuscript; the 'Algorithm 3 MiddleConfidenceInterval c12' image is the (yellow) point estimate map obtained for the targetted Cohen's d effect size c = 1.2 using Algorithm. 3. etc.
Finally, the 'SnPM filtered' image is the thresholded (p < 0.05 FWE; obtained via permutation) statistical results map from applying a group-level one sample t-test to the 80 subjects' data.
homo sapiens
R
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository houses the Python preprocessing scripts utilized in generating the metadata for García-Lee et al., (2024) dataset. With these files and scripts, you gain access to the algorithm and examples for generating gridded products in netCDF format, specifically featuring the 0°C isotherm field.
File |
Type |
Description |
Python script |
0°C Isotherm Detection Algorithm. | |
Python script |
Calculation of Daily Mean. | |
netCDF |
0°C Isotherm Data at 6-Hour Intervals (1959-2021) in meters above sea level (m a.s.l.). | |
netCDF |
Raw ERA5 data example for 1959: Temperature (°K) and Geopotential (m**2 s**-2). |
The file 'Observations and Charts.pdf' shows averages, standard deviations, bias, and trends of the 0°C isotherm for Puerto Montt, Río Gallegos, Comodoro Rivadavia, and Punta Arenas. These values were estimated using both observations and reanalysis ERA5 data.
García-Lee, N., Bravo, C., Gónzalez-Reyes, Á., and Mardones, P.: Spatial and temporal variability of the freezing level in Patagonia's atmosphere, Weather Clim. Dynam., 5, 1137–1151, https://doi.org/10.5194/wcd-5-1137-2024, 2024.
Matlab has a reputation for running slowly. Here are some pointers on how to speed computations, to an often unexpected degree. Subjects currently covered: Matrix Coding Implicit Multithreading on a Multicore Machine Sparse Matrices Sub-Block Computation to Avoid Memory Overflow Matrix Coding - 1 Matlab documentation notes that efficient computation depends on using the matrix facilities, and that mathematically identical algorithms can have very different runtimes, but they are a bit coy about just what these differences are. A simple but telling example: The following is the core of the GD-CLS algorithm of Berry et.al., copied from fig. 1 of Shahnaz et.al, 2006, "Document clustering using nonnegative matrix factorization': for jj = 1:maxiter A = W'*W + lambda*eye(k); for ii = 1:n b = W'*V(:,ii); H(:,ii) = A \ b; end H = H .* (H>0); W = W .* (V*H') ./ (W*(H*H') + 1e-9); end Replacing the columwise update of H with a matrix update gives: for jj = 1:maxiter A = W'*W + lambda*eye(k); B = W'*V; H = A \ B; H = H .* (H>0); W = W .* (V*H') ./ (W*(H*H') + 1e-9); end These were tested on an 8049 x 8660 sparse matrix bag of words V (.0083 non-zeros), with W of size 8049 x 50, H 50 x 8660, maxiter = 50, lambda = 0.1, and identical initial W. They were run consecutivly, multithreaded on an 8-processor Sun server, starting at ~7:30PM. Tic-toc timing was recorded. Runtimes were respectivly 6586.2 and 70.5 seconds, a 93:1 difference. The maximum absolute pairwise difference between W matrix values was 6.6e-14. Similar speedups have been consistantly observed in other cases. In one algorithm, combining matrix operations with efficient use of the sparse matrix facilities gave a 3600:1 speedup. For speed alone, C-style iterative programming should be avoided wherever possible. In addition, when a couple lines of matrix code can substitute for an entire C-style function, program clarity is much improved. Matrix Coding - 2 Applied to integration, the speed gains are not so great, largely due to the time taken to set up the and deal with the boundaries. The anyomous function setup time is neglegable. I demonstrate on a simple uniform step linearly interpolated 1-D integration of cos() from 0 to pi, which should yield zero: tic; step = .00001; fun = @cos; start = 0; endit = pi; enda = floor((endit - start)/step)step + start; delta = (endit - enda)/step; intF = fun(start)/2; intF = intF + fun(endit)delta/2; intF = intF + fun(enda)(delta+1)/2; for ii = start+step:step:enda-step intF = intF + fun(ii); end intF = intFstep toc; intF = -2.910164109692914e-14 Elapsed time is 4.091038 seconds. Replacing the inner summation loop with the matrix equivalent speeds things up a bit: tic; step = .00001; fun = @cos; start = 0; endit = pi; enda = floor((endit - start)/step)*step + start; delta = (endit - enda)/step; intF = fun(start)/2; intF = intF + fun(endit)*delta/2; intF = intF + fun(enda)*(delta+1)/2; intF = intF + sum(fun(start+step:step:enda-step)); intF = intF*step toc; intF = -2.868419946011613e-14 Elapsed time is 0.141564 seconds. The core computation take
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This article contains data related to the research article “G. Auzias, L. Brun, C. Deruelle, O. Coulon, Deep sulcal landmarks: Algorithmic and conceptual improvements in the definition and extraction of sulcal pits, Neuroimage. 111 (2015) 12–25. doi:10.1016/j.neuroimage.2015.02.008”. This data can be used as a benchmark for quantitative evaluation of sulcal pits extraction algorithm. In particular, it allows a quantitative comparison with our method, and the assessment of the consistency of the sulcal pits extraction across two well-matched populations.
This software implements the autonomous control of a robot by using a fuzzy logic controller tuned by a genetic algorithm. The software was written in C programming language for Windows (SDK). A description of the software can be found in the research publication "Arsene, C.T.C., & Zalzala, A.M.S., "Control of autonomous robots using fuzzy logic controllers tuned by genetic algorithms", In Proc Congress on Evolutionary Computation, Vol. 1, pp. 428-35, Washington DC, 1999, IEEE Computer Science Press, ISBN 0-7803-5536-9". Possibly the software to be used also for simulation of Nano-robots.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 7 rows and is filtered where the books is Assessing and improving prediction and classification : theory and algorithms in C++. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.