Model-based prognostics approaches use domain knowledge about a system and its failure modes through the use of physics-based models. Model-based prognosis is generally divided into two sequential problems: a joint state-parameter estimation problem, in which, using the model, the health of a system or component is determined based on the observations; and a prediction problem, in which, using the model, the state-parameter distribution is simulated forward in time to compute end of life and remaining useful life. The first problem is typically solved through the use of a state observer, or filter. The choice of filter depends on the assumptions that may be made about the system, and on the desired algorithm performance. In this paper, we review three separate filters for the solution to the first problem: the Daum filter, an exact nonlinear filter; the unscented Kalman filter, which approximates nonlinearities through the use of a deterministic sampling method known as the unscented transform; and the particle filter, which approximates the state distribution using a finite set of discrete, weighted samples, called particles. Using a centrifugal pump as a case study, we conduct a number of simulation-based experiments investigating the performance of the different algorithms as applied to prognostics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises sea surface height (SSH) and velocity data at the ocean surface in two small regions near the Agulhas retroflection. The unfiltered SSH and a horizontal velocity field are provided, along with the same fields after various kinds of filtering, as described in the accompanying manuscript, Using Lagrangian filtering to remove waves from the ocean surface velocity field (https://doi.org/10.31223/X5D352). The code repository for this work is https://github.com/cspencerjones/separating-balanced .
Two time-resolutions are provided: two weeks of hourly data and 70 days of daily data.
Seventy_daysA.nc contains daily data for region A and Seventy_daysB.nc contains daily data for region B, including unfiltered, lagrangian filtered and omega-filtered velocity and sea-surface height.
two_weeksA.nc contains hourly data for region A and two_weeksB.nc contains hourly data for region B, including unfiltered and lagrangian filtered velocity and sea-surface height.
Note that region A has been moved in version 2 of this dataset.
See the manuscript and code repository for more information.
This work was supported by NASA award 80NSSC20K1142.
Contains scans of a bin filled with different parts ( screws, nuts, rods, spheres, sprockets). For each part type, RGB image and organized 3D point cloud obtained with structured light sensor are provided. In addition, unorganized 3D point cloud representing an empty bin and a small Matlab script to read the files is also provided. 3D data contain a lot of outliers and the data were used to demonstrate a new filtering technique.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bagging (i.e., bootstrap aggregating) involves combining an ensemble of bootstrap estimators. We consider bagging for inference from noisy or incomplete measurements on a collection of interacting stochastic dynamic systems. Each system is called a unit, and each unit is associated with a spatial location. A motivating example arises in epidemiology, where each unit is a city: the majority of transmission occurs within a city, with smaller yet epidemiologically important interactions arising from disease transmission between cities. Monte Carlo filtering methods used for inference on nonlinear non-Gaussian systems can suffer from a curse of dimensionality as the number of units increases. We introduce bagged filter (BF) methodology which combines an ensemble of Monte Carlo filters, using spatiotemporally localized weights to select successful filters at each unit and time. We obtain conditions under which likelihood evaluation using a BF algorithm can beat a curse of dimensionality, and we demonstrate applicability even when these conditions do not hold. BF can out-perform an ensemble Kalman filter on a coupled population dynamics model describing infectious disease transmission. A block particle filter also performs well on this task, though the bagged filter respects smoothness and conservation laws that a block particle filter can violate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the streaming data setting, where data arrive continuously or in frequent batches and there is no pre-determined amount of total data, Bayesian models can employ recursive updates, incorporating each new batch of data into the model parameters’ posterior distribution. Filtering methods are currently used to perform these updates efficiently, however, they suffer from eventual degradation as the number of unique values within the filtered samples decreases. We propose Generative Filtering, a method for efficiently performing recursive Bayesian updates in the streaming setting. Generative Filtering retains the speed of a filtering method while using parallel updates to avoid degenerate distributions after repeated applications. We derive rates of convergence for Generative Filtering and conditions for the use of sufficient statistics instead of fully storing all past data. We investigate the alleviation of filtering degradation through simulation and an ecological time series of counts. Supplementary materials for this article are available online.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 4 rows and is filtered where the books is An introduction to wavelets and other filtering methods in finance and economics. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
Dataset Card for No Language Left Behind (NLLB - 200vo)
Dataset Summary
This dataset was created based on metadata for mined bitext released by Meta AI. It contains bitext for 148 English-centric and 1465 non-English-centric language pairs using the stopes mining library and the LASER3 encoders (Heffernan et al., 2022). The complete dataset is ~450GB. CCMatrix contains previous versions of mined instructions.
How to use the data
There are two ways… See the full description on the dataset page: https://huggingface.co/datasets/yaya-sy/nllb-filtering.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this brief
Overview
This dataset is designed to evaluate the effectiveness of toxicity and bias filtering methods. The objective is to detect and filter a small subset of toxic or unsafe examples that have been injected into a larger, predominantly safe training set, using a reference set that exposes unsafe model behavior. All models are evaluated using the same training and reference sets. We provide two evaluation settings, denoted by the suffixes Hom (Homogeneous) and Het (Heterogeneous).… See the full description on the dataset page: https://huggingface.co/datasets/DataAttributionEval/Toxicity-Bias-Filtering.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The two CSV files here are the train and test data in Kaggle's Ion Switching Competition with drift removed and filter with Kalman filter to reduce noise.
This ideas where posted by @cdeotte and @teejmahal20, I just run the filter and the FE and save the data.
In this paper, we propose a novel approach to reduce the noise in Synthetic Aperture Radar (SAR) images using particle filters. Interpretation of SAR images is a difficult problem, since they are contaminated with a multiplicative noise, which is known as the “Speckle Noise”. In literature, the general approach for removing the speckle is to use the local statistics, which are computed in a square window. Here, we propose to use particle filters, which is a sequential Bayesian technique. The proposed method also uses the local statistics to denoise the images. Since this is a Bayesian approach, the computed statistics of the window can be exploited as a priori information. Moreover, particle filters are sequential methods, which are more appropriate to handle the heterogeneous structure of the image. Computer simulations show that the proposed method provides better edge-preserving results with satisfactory speckle removal, when compared to the results obtained by Gamma Maximum a posteriori (MAP) filter.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This package contains evaluation supplementary files for the paper: Grid-Based Bayesian Filtering Methods for Pedestrian Dead Reckoning Indoor Positioning Using Smartphones by Miroslav Opiela and František Galčík.
Contents:
Venues
Data are recorded in three buildings:
Used datasets
A subset of input data is derivated from available logfiles provided by organizers of IPIN 2018 and IPIN 2019 competitions:
Funding
The work was partially supported by the Slovak Grant Agency of the Ministry of Education and Academy of Science of the Slovak Republic under grant no. 1/0056/18 and by the Slovak Research and Development Agency under the contract no. APVV-15-0091.
Contact
For any further questions, please contact:
Miroslav Opiela, miroslav.opiela@upjs.sk Institute of Computer Science, Faculty of Science, P. J. Šafárik University (UPJS), Košice, Slovakia
Many diagnostic datasets suffer from the adverse effects of spikes that are embedded in data and noise. For example, this is true for electrical power system data where the switches, relays, and inverters are major contributors to these effects. Spikes are mostly harmful to the analysis of data in that they throw off real-time detection of abnormal conditions, and classification of faults. Since noise and spikes are mixed together and embedded within the data, removal of the unwanted signals from the data is not always easy and may result in losing the integrity of the information carried by the data. Additionally, in some applications noise and spikes need to be filtered independently. The proposed algorithm is a multi-resolution filtering approach based on Haar wavelets that is capable of removing spikes while incurring insignificant damage to other data. In particular, noise in the data, which is a useful indicator that a sensor is healthy and not stuck, can be preserved using our approach. Presented here is the theoretical background with some examples from a realistic testbed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spatio-temporal datasets are rapidly growing in size. For example, environmental variables are measured with increasing resolution by increasing numbers of automated sensors mounted on satellites and aircraft. Using such data, which are typically noisy and incomplete, the goal is to obtain complete maps of the spatio-temporal process, together with uncertainty quantification. We focus here on real-time filtering inference in linear Gaussian state-space models. At each time point, the state is a spatial field evaluated on a very large spatial grid, making exact inference using the Kalman filter computationally infeasible. Instead, we propose a multi-resolution filter (MRF), a highly scalable and fully probabilistic filtering method that resolves spatial features at all scales. We prove that the MRF matrices exhibit a particular block-sparse multi-resolution structure that is preserved under filtering operations through time. We describe connections to existing methods, including hierarchical matrices from numerical mathematics. We also discuss inference on time-varying parameters using an approximate Rao-Blackwellized particle filter, in which the integrated likelihood is computed using the MRF. Using a simulation study and a real satellite-data application, we show that the MRF strongly outperforms competing approaches. Supplementary materials include Python code for reproducing the simulations, some detailed properties of the MRF and auxiliary theoretical results.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises sea surface height (SSH) and velocity data at the ocean surface in two small regions near the Agulhas retroflection. The unfiltered SSH and a horizontal velocity field are provided, along with the same fields after various kinds of filtering, as described in the accompanying manuscript, Separating balanced and unbalanced flow at the surface of the Agulhas region using Lagrangian filtering. The code repository for this work is https://github.com/cspencerjones/separating-balanced .
Two time-resolutions are provided: two weeks of hourly data and 70 days of daily data. See the manuscript for more information.
This work was supported by NASA award 80NSSC20K1142.
Particle filters (PF) have been established as the de facto state of the art in failure prognosis. They combine advantages of the rigors of Bayesian estimation to nonlinear prediction while also providing uncertainty estimates with a given solution. Within the context of particle filters, this paper introduces several novel methods for uncertainty representations and uncertainty management. The prediction uncertainty is modeled via a rescaled Epanechnikov kernel and is assisted with resampling techniques and regularization algorithms. Uncertainty management is accomplished through parametric adjustments in a feedback correction loop of the state model and its noise distributions. The correction loop provides the mechanism to incorporate information that can improve solution accuracy and reduce uncertainty bounds. In addition, this approach results in reduction in computational burden. The scheme is illustrated with real vibration feature data from a fatigue-driven fault in a critical aircraft component.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Internet Filtering Software market is experiencing robust growth, projected to reach a substantial size by 2033. A compound annual growth rate (CAGR) of 14% from 2025 to 2033 indicates a significant upward trajectory driven by several key factors. The increasing adoption of cloud-based solutions, coupled with heightened concerns surrounding cybersecurity threats and data privacy regulations, is fueling market expansion. Businesses across various sectors, including BFSI (Banking, Financial Services, and Insurance), IT & Telecom, Government, and Education, are actively investing in robust internet filtering software to protect their sensitive data and comply with regulatory mandates. The market is segmented by component (solution and services), deployment mode (cloud and on-premises), filtering type (DNS, keyword, URL, and other filtering methods), and industry vertical. The cloud deployment model is witnessing accelerated adoption due to its scalability, cost-effectiveness, and ease of management. Furthermore, the rising prevalence of sophisticated cyber threats, including malware and phishing attacks, necessitates advanced filtering capabilities, driving demand for comprehensive solutions that go beyond basic URL filtering. The competitive landscape comprises established players like Broadcom, Cisco, Palo Alto Networks, and McAfee, alongside emerging innovative companies. However, factors such as the high initial investment cost for implementing comprehensive solutions and the complexity of managing sophisticated filtering systems might pose challenges to market growth. Future growth will depend heavily on ongoing innovation in threat detection, seamless integration with existing IT infrastructure, and the increasing awareness of the need for robust internet security among organizations of all sizes. The increasing sophistication of cyberattacks and the evolving regulatory landscape are likely to continue driving demand for advanced internet filtering solutions over the forecast period. The Asia Pacific region is expected to witness substantial growth due to increasing internet penetration and the rising adoption of internet-connected devices in developing economies. North America and Europe, while already relatively mature markets, are anticipated to continue showing moderate growth driven by continuous upgrades to existing systems and the adoption of advanced features. The continuous emergence of new and advanced threats will remain a pivotal driving force behind the sustained growth of this market. Competition is expected to remain high, with companies investing heavily in R&D to develop and deploy cutting-edge solutions. Strategic partnerships and acquisitions will likely play a crucial role in shaping the market landscape in the coming years. Key drivers for this market are: , Strict Government Regulations and the Need for Compliance; Growing BYOD Trend; Growing Online Malware and the Increasing Refinement Levels of Web Attacks. Potential restraints include: , Strict Government Regulations and the Need for Compliance; Growing BYOD Trend; Growing Online Malware and the Increasing Refinement Levels of Web Attacks. Notable trends are: BFSI to Drive the Market Growth.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
RIP is a method for perference data filtering. The core idea is that low-quality input prompts lead to high variance and low-quality responses. By measuring the quality of rejected responses and the reward gap between chosen and rejected preference pairs, RIP effectively filters prompts to enhance dataset quality. We release 4k data that filtered from 20k Wildchat prompts. For each prompt, we provide 32 responses from Llama-3.3-70B-Instruct and their corresponding rewards obtained from ArmoRM.… See the full description on the dataset page: https://huggingface.co/datasets/facebook/Wildchat-RIP-Filtered-by-70b-Llama.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: The accuracy of microbial community detection in 16S rRNA marker-gene and metagenomic studies suffers from contamination and sequencing errors that lead to either falsely identifying microbial taxa that were not in the sample or misclassifying the taxa of DNA fragment reads. Removing contaminants and filtering rare features are two common approaches to deal with this problem. While contaminant detection methods use auxiliary sequencing process information to identify known contaminants, filtering methods remove taxa that are present in a small number of samples and have small counts in the samples where they are observed. The latter approach reduces the extreme sparsity of microbiome data and has been shown to correctly remove contaminant taxa in cultured “mock” datasets, where the true taxa compositions are known. Although filtering is frequently used, careful evaluation of its effect on the data analysis and scientific conclusions remains unreported. Here, we assess the effect of filtering on the alpha and beta diversity estimation as well as its impact on identifying taxa that discriminate between disease states.Results: The effect of filtering on microbiome data analysis is illustrated on four datasets: two mock quality control datasets where the same cultured samples with known microbial composition are processed at different labs and two disease study datasets. Results show that in microbiome quality control datasets, filtering reduces the magnitude of differences in alpha diversity and alleviates technical variability between labs while preserving the between samples similarity (beta diversity). In the disease study datasets, DESeq2 and linear discriminant analysis Effect Size (LEfSe) methods were used to identify taxa that are differentially abundant across groups of samples, and random forest models were used to rank features with the largest contribution toward disease classification. Results reveal that filtering retains significant taxa and preserves the model classification ability measured by the area under the receiver operating characteristic curve (AUC). The comparison between the filtering and the contaminant removal method shows that they have complementary effects and are advised to be used in conjunction.Conclusions: Filtering reduces the complexity of microbiome data while preserving their integrity in downstream analysis. This leads to mitigation of the classification methods' sensitivity and reduction of technical variability, allowing researchers to generate more reproducible and comparable results in microbiome data analysis.
Model-based prognostics approaches use domain knowledge about a system and its failure modes through the use of physics-based models. Model-based prognosis is generally divided into two sequential problems: a joint state-parameter estimation problem, in which, using the model, the health of a system or component is determined based on the observations; and a prediction problem, in which, using the model, the state-parameter distribution is simulated forward in time to compute end of life and remaining useful life. The first problem is typically solved through the use of a state observer, or filter. The choice of filter depends on the assumptions that may be made about the system, and on the desired algorithm performance. In this paper, we review three separate filters for the solution to the first problem: the Daum filter, an exact nonlinear filter; the unscented Kalman filter, which approximates nonlinearities through the use of a deterministic sampling method known as the unscented transform; and the particle filter, which approximates the state distribution using a finite set of discrete, weighted samples, called particles. Using a centrifugal pump as a case study, we conduct a number of simulation-based experiments investigating the performance of the different algorithms as applied to prognostics.