16 datasets found

f
Data from: A Diagnostic Procedure for Detecting Outliers in Linear...
tandf.figshare.com
figshare.com
txt
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dongjun You; Michael Hunter; Meng Chen; Sy-Miin Chow (2024). A Diagnostic Procedure for Detecting Outliers in Linear State–Space Models [Dataset]. http://doi.org/10.6084/m9.figshare.12162075.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12162075.v1
Dataset updated
Feb 9, 2024
Dataset provided by
Taylor & Francis
Authors
Dongjun You; Michael Hunter; Meng Chen; Sy-Miin Chow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Outliers can be more problematic in longitudinal data than in independent observations due to the correlated nature of such data. It is common practice to discard outliers as they are typically regarded as a nuisance or an aberration in the data. However, outliers can also convey meaningful information concerning potential model misspecification, and ways to modify and improve the model. Moreover, outliers that occur among the latent variables (innovative outliers) have distinct characteristics compared to those impacting the observed variables (additive outliers), and are best evaluated with different test statistics and detection procedures. We demonstrate and evaluate the performance of an outlier detection approach for multi-subject state-space models in a Monte Carlo simulation study, with corresponding adaptations to improve power and reduce false detection rates. Furthermore, we demonstrate the empirical utility of the proposed approach using data from an ecological momentary assessment study of emotion regulation together with an open-source software implementation of the procedures.
z
Controlled Anomalies Time Series (CATS) Dataset
zenodo.org
bin
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Fleith; Patrick Fleith (2024). Controlled Anomalies Time Series (CATS) Dataset [Dataset]. http://doi.org/10.5281/zenodo.7646897
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7646897
Dataset updated
Jul 12, 2024
Dataset provided by
Solenix Engineering GmbH
Authors
Patrick Fleith; Patrick Fleith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Controlled Anomalies Time Series (CATS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system with 200 injected anomalies.

The CATS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Anomaly Detection Algorithms in Multivariate Time Series [1]:

Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:

4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.

3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.

10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.

5 million timestamps. Sensors readings are at 1Hz sampling frequency.

1 million nominal observations (the first 1 million datapoints). This is suitable to start learning the "normal" behaviour.

4 million observations that include both nominal and anomalous segments. This is suitable to evaluate both semi-supervised approaches (novelty detection) as well as unsupervised approaches (outlier detection).

200 anomalous segments. One anomalous segment may contain several successive anomalous observations / timestamps. Only the last 4 million observations contain anomalous segments.

Different types of anomalies to understand what anomaly types can be detected by different approaches.

Fine control over ground truth. As this is a simulated system with deliberate anomaly injection, the start and end time of the anomalous behaviour is known very precisely. In contrast to real world datasets, there is no risk that the ground truth contains mislabelled segments which is often the case for real data.

Obvious anomalies. The simulated anomalies have been designed to be "easy" to be detected for human eyes (i.e., there are very large spikes or oscillations), hence also detectable for most algorithms. It makes this synthetic dataset useful for screening tasks (i.e., to eliminate algorithms that are not capable to detect those obvious anomalies). However, during our initial experiments, the dataset turned out to be challenging enough even for state-of-the-art anomaly detection approaches, making it suitable also for regular benchmark studies.

Context provided. Some variables can only be considered anomalous in relation to other behaviours. A typical example consists of a light and switch pair. The light being either on or off is nominal, the same goes for the switch, but having the switch on and the light off shall be considered anomalous. In the CATS dataset, users can choose (or not) to use the available context, and external stimuli, to test the usefulness of the context for detecting anomalies in this simulation.

Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.

No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.

[1] Example Benchmark of Anomaly Detection in Time Series: “Sebastian Schmidl, Phillip Wenig, and Thorsten Papenbrock. Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB, 15(9): 1779 - 1797, 2022. doi:10.14778/3538598.3538602”

About Solenix

Solenix is an international company providing software engineering, consulting services and software products for the space market. Solenix is a dynamic company that brings innovative technologies and concepts to the aerospace market, keeping up to date with technical advancements and actively promoting spin-in and spin-out technology activities. We combine modern solutions which complement conventional practices. We aspire to achieve maximum customer satisfaction by fostering collaboration, constructivism, and flexibility.
f
Data from: Outlier detection in cylindrical data based on Mahalanobis...
tandf.figshare.com
text/x-tex
Updated Jan 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prashant S. Dhamale; Akanksha S. Kashikar (2025). Outlier detection in cylindrical data based on Mahalanobis distance [Dataset]. http://doi.org/10.6084/m9.figshare.24092089.v1
Explore at:
text/x-texAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24092089.v1
Dataset updated
Jan 2, 2025
Dataset provided by
Taylor & Francis
Authors
Prashant S. Dhamale; Akanksha S. Kashikar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cylindrical data are bivariate data formed from the combination of circular and linear variables. Identifying outliers is a crucial step in any data analysis work. This paper proposes a new distribution-free procedure to detect outliers in cylindrical data using the Mahalanobis distance concept. The use of Mahalanobis distance incorporates the correlation between the components of the cylindrical distribution, which had not been accounted for in the earlier papers on outlier detection in cylindrical data. The threshold for declaring an observation to be an outlier can be obtained via parametric or non-parametric bootstrap, depending on whether the underlying distribution is known or unknown. The performance of the proposed method is examined via extensive simulations from the Johnson-Wehrly distribution. The proposed method is applied to two real datasets, and the outliers are identified in those datasets.
e
Compilation of historic and ongoing N2O observations across Europe - Dataset...
b2find.eudat.eu
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Compilation of historic and ongoing N2O observations across Europe - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/51ecac80-097d-534e-979f-60d7b003c84a
Explore at:
Dataset updated
Jun 1, 2024
Area covered
Europe
Description
This dataset combines historic and ongoing N2O observations across Europe from several networks and individual institutions. The main purpose of the dataset is to provide input to atmospheric inversion of N2O fluxes. As such only observations from the highest inlet were considered at tall tower sites. The dataset comprises observations from traditional GC-ECD systems as well as modern laser spectroscopy instruments. In addition to continuous observations, flask observations from the NOAA network were included. A total of 50 time series from 43 observing sites are included covering the period 2005 to the end of January 2024. Of these sites 28 were reporting continuous observations in 2023. The data and metadata items were brought to the same format including a common flagging system and reporting of uncertainties. Additional outlier flagging may be applied by the user for additional filtering. Original network flags were maintained. Different elements of measurement uncertainty are reported by different networks/institutions. Depending on availability, three different components of uncertainty were maintained in the data reflecting standard deviation of the ambient observation during the observation interval, repeatability of working standards and combined uncertainty. N2O data are mostly reported on the WMO-X2006A calibration scale, with some exceptions reporting on the SIO-98 and SIO-16 scales, which were considered equivalent. However, additional site-to-site bias correction (< 0.5 ppb) may be required when using the data in inverse modelling.
e
Collection of historic and ongoing N2O observations across Europe (release...
b2find.eudat.eu
Updated Jun 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Collection of historic and ongoing N2O observations across Europe (release 2024-11) - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/2d92fd57-c70d-5266-a1e2-30d51aa8d875
Explore at:
Dataset updated
Jun 1, 2024
Area covered
Europe
Description
This dataset combines historic and ongoing N2O observations across Europe from several networks and individual institutions. The main purpose of the dataset is to provide input to atmospheric inversion of N2O fluxes. As such only observations from the highest inlet were considered at tall tower sites. The dataset comprises observations from traditional GC-ECD systems as well as modern laser spectroscopy instruments. In addition to continuous observations, flask observations from the NOAA network were included. A total of 50 time series from 43 observing sites are included covering the period 2005 to the end of January 2024. Of these sites 28 were reporting continuous observations in 2023. The data and metadata items were brought to the same format including a common flagging system and reporting of uncertainties. Additional outlier flagging may be applied by the user for additional filtering. Original network flags were maintained. Different elements of measurement uncertainty are reported by different networks/institutions. Depending on availability, three different components of uncertainty were maintained in the data reflecting standard deviation of the ambient observation during the observation interval, repeatability of working standards and combined uncertainty. N2O data are mostly reported on the WMO-X2006A calibration scale, with some exceptions reporting on the SIO-98 and SIO-16 scales, which were considered equivalent. However, additional site-to-site bias correction (< 0.5 ppb) may be required when using the data in inverse modelling.
MODIS/Terra Land Surface Temperature/3-Band Emissivity Daily L3 Global 1km...
data.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). MODIS/Terra Land Surface Temperature/3-Band Emissivity Daily L3 Global 1km SIN Grid Night V061 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/modis-terra-land-surface-temperature-3-band-emissivity-daily-l3-global-1km-sin-grid-night--68e7d
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
A suite of Moderate Resolution Imaging Spectroradiometer (MODIS) Land Surface Temperature and Emissivity (LST&E) products are available in Collection 6.1. The MOD21 Land Surface Temperature (LST) algorithm differs from the algorithm of the MOD11 LST products, in that the MOD21 algorithm is based on the ASTER Temperature/Emissivity Separation (TES) technique, whereas the MOD11 uses the split-window technique. The MOD21 TES algorithm uses a physics-based algorithm to dynamically retrieve both the LST and spectral emissivity simultaneously from the MODIS thermal infrared bands 29, 31, and 32. The TES algorithm is combined with an improved Water Vapor Scaling (WVS) atmospheric correction scheme to stabilize the retrieval during very warm and humid conditions. The MOD21A1N dataset is produced daily from nighttime Level 2 Gridded (L2G) intermediate LST products at a spatial resolution of 1,000 meters. The L2G process maps the daily MOD21 swath granules onto a sinusoidal MODIS grid and stores all observations falling over a gridded cell for a given day. The MOD21A1 algorithm sorts through these observations for each cell and estimates the final LST value as an average from all observations that are cloud free and have good LST&E accuracies. The nighttime average is weighted by the observation coverage for that cell. Only observations having an observation coverage greater than a 15% threshold are considered. The MOD21A1N product contains seven Science Datasets (SDS), which include the calculated LST as well as quality control, the three emissivity bands, view zenith angle, and time of observation. Additional details regarding the methodology used to create this Level 3 (L3) product are available in the Algorithm Theoretical Basis Document (ATBD).Known Issues Users of MODIS LST products may notice an increase in occurrences of extreme high temperature outliers in the unfiltered MxD21 Version 6 and 6.1 products compared to the heritage MxD11 LST products. This can occur especially over desert regions like the Sahara where undetected cloud and dust can negatively impact both the MxD21 and MxD11 retrieval algorithms. * In the MxD11 LST products, these contaminated pixels are flagged in the algorithm and set to fill values in the output products based on differences in the band 32 and band 31 radiances used in the generalized split window algorithm. In the MxD21 LST products, values for the contaminated pixels are retained in the output products (and may result in overestimated temperatures), and users need to apply Quality Control (QC) filtering and other error analyses for filtering out bad values. High temperature outlier thresholds are not employed in MxD21 since it would potentially remove naturally occurring hot surface targets such as fires and lava flows. High atmospheric aerosol optical depth (AOD) caused by vast dust outbreaks in the Sahara and other deserts highlighted in the example documentation are the primary reason for high outlier surface temperature values (and corresponding low emissivity values) in the MxD21 LST products. Future versions of the MxD21 product will include a dust flag from the MODIS aerosol product and/or brightness temperature look up tables to filter out contaminated dust pixels. It should be noted that in the MxD11B day/night algorithm products, more advanced cloud filtering is employed in the multi-day products based on a temporal analysis of historical LST over cloudy areas. This may result in more stringent filtering of dust contaminated pixels in these products. * In order to mitigate the impact of dust in the MxD21 V6 and 6.1 products, the science team recommends using a combination of the existing QC bits, emissivity values, and estimated product errors, to confidently remove bad pixels from analysis. For more details, refer to this dust and cloud contamination example documentation. For complete information about known issues please refer to the MODIS/VIIRS Land Quality Assessment website.Improvements/Changes from Previous Versions The Version 6.1 Level-1B (L1B) products have been improved by undergoing various calibration changes that include: changes to the response-versus-scan angle (RVS) approach that affects reflectance bands for Aqua and Terra MODIS, corrections to adjust for the optical crosstalk in Terra MODIS infrared (IR) bands, and corrections to the Terra MODIS forward look-up table (LUT) update for the period 2012 - 2017. A polarization correction has been applied to the L1B Reflective Solar Bands (RSB). The product utilizes GEOS data replacing MERRA2. * Three new CMG products are available in the MxD21 suite (MxD21C1/C2/C3).
e
Membership and rotational data for clusters - Dataset - B2FIND
b2find.eudat.eu
Updated May 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Membership and rotational data for clusters - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/695d1255-0811-5c1f-bd62-9f0b58b61342
Explore at:
Dataset updated
May 6, 2023
Description
The period versus mass diagrams (i.e., rotational sequences) of open clusters provide crucial constraints for angular momentum evolution studies. However, their memberships are often heavily contaminated by field stars, which could potentially bias the interpretations. In this paper, we use data from Gaia DR2 to reassess the memberships of seven open clusters with ground- and space-based rotational data, and present an updated view of stellar rotation as a function of mass and age. We use the Gaia astrometry to identify the cluster members in phase space, and the photometry to derive revised ages and place the stars on a consistent mass scale. Applying our membership analysis to the rotational sequences reveals that: (1) the contamination in clusters observed from the ground can reach up to ~35%; (2) the overall fraction of rotational outliers decreases substantially when the field contaminants are removed, but some outliers persist; (3) there is a sharp upper edge in the rotation periods at young ages; (4) at young ages, stars in the 1.0-0.6M_{sun}_ range inhabit a global maximum of rotation periods, potentially providing an optimal window for habitable planets. Additionally, we see clear evidence for a strongly mass-dependent spin-down process. In the regime where rapid rotators are leaving the saturated domain, the rotational distributions broaden (in contradiction with popular models), which we interpret as evidence that the torque must be lower for rapid rotators than for intermediate ones. The cleaned rotational sequences from ground-based observations can be as constraining as those obtained from space.
e
SMARTS observations of eps Eridani - Dataset - B2FIND
b2find.eudat.eu
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). SMARTS observations of eps Eridani - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/36fd84c3-0439-577e-8e5b-4ca75706bf1c
Explore at:
Dataset updated
Oct 31, 2023
Description
The active K2 dwarf {epsilon} Eri has been extensively characterized both as a young solar analog and more recently as an exoplanet host star. As one of the nearest and brightest stars in the sky, it provides an unparalleled opportunity to constrain stellar dynamo theory beyond the Sun. We confirm and document the 3-year magnetic activity cycle in {epsilon} Eri originally reported by Hatzes and coworkers (2000ApJ...544L.145H), and we examine the archival data from previous observations spanning 45 years. The data show coexisting 3-year and 13-year periods leading into a broad activity minimum that resembles a Maunder minimum-like state, followed by the resurgence of a coherent 3-year cycle. The nearly continuous activity record suggests the simultaneous operation of two stellar dynamos with cycle periods of 2.95+/-0.03yr and 12.7+/-0.3 years, which, by analogy with the solar case, suggests a revised identification of the dynamo mechanisms that are responsible for the so-called "active" and "inactive" sequences as proposed by B"ohm-Vitense (2007ApJ...657..486B). Finally, based on the observed properties of {epsilon} Eri, we argue that the rotational history of the Sun is what makes it an outlier in the context of magnetic cycles observed in other stars (as also suggested by its Li depletion), and that a Jovian-mass companion cannot be the universal explanation for the solar peculiarities.
H
High frequency dataset for event-scale concentration-discharge analysis in a...
hydroshare.org
search.dataone.org
zip
Updated Sep 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreas Musolff (2024). High frequency dataset for event-scale concentration-discharge analysis in a forested headwater 01/2018-08/2023 [Dataset]. http://doi.org/10.4211/hs.9be43573ba754ec1b3650ce233fc99de
Explore at:
zip(17.1 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.9be43573ba754ec1b3650ce233fc99de
Dataset updated
Sep 19, 2024
Dataset provided by
HydroShare
Authors
Andreas Musolff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2018 - Aug 23, 2023
Area covered

Description
This composite repository contains high-frequency data of discharge, electrical conductivity, nitrate-N, DOC and water temperature obtained the Rappbode headwater catchment in the Harz mountains, Germany. This catchment was affected by a bark-beetle infestion and forest dieback from 2018 onwards.The data extents previous observations from the same catchment (RB) published as part of Musolff (2020). Details on the catchment can be found here: Werner et al. (2019, 2021), Musolff et al. (2021). The file RB_HF_data_2018_2023.txt states measurements for each timestep using the following columns: "index" (number of observation),"Date.Time" (timestamp in YYYY-MM-DD HH:MM:SS), "WT" (water temperature in degree celsius), "Q.smooth" ( discharge in mm/d smoothed using moving average), "NO3.smooth" (nitrate concentrations in mg N/L smoothed using moving average), "DOC.smooth" (Dissolved organic carbon concentrations in mg/L, smoothed using moving average), "EC.smooth" (electrical conductivity in µS/cm smoothed using moving average); NA - no data.

Water quality data and discharge was measured at a high-frequency interval of 15 min in the time period between January 2018 and August 2023. Both, NO3-N and DOC were measured using an in-situ UV-VIS probe (s::can spectrolyser, scan Austria). EC was measured using an in-situ probe (CTD Diver, Van Essen Canada). Discharge measurements relied on an established stage-discharge relationship based on water level observations (CTD Diver, Van Essen Canada, see Werner et al. [2019]). Data loggers were maintained every two weeks, including manual cleaning of the UV-VIS probes and grab sampling for subsequent lab analysis, calibration and validation.

Data preparation included five steps: drift corrections, outlier detection, gap filling, calibration and moving averaging: - Drift was corrected by distributing the offset between mean values one hour before and after cleaning equally among the two weeks maintenance interval as an exponential growth. - Outliers were detected with a two-step procedure. First, values outside a physically unlikely range were removed. Second, the Grubbs test, to detect and remove outliers, was applied to a moving window of 100 values. - Data gaps smaller than two hours were filled using cubic spline interpolation. - The resulting time series were globally calibrated against the lab measured concentration of NO3-N and DOC. EC was calibrated against field values obtained with a handheld WTW probe (WTW Multi 430, Xylem Analytics Germany). - Noise in the signal of both discharge and water quality was reduced by a moving average with a window lenght of 2.5 hours.

References: Musolff, A. (2020). High frequency dataset for event-scale concentration-discharge analysis. https://doi.org/http://www.hydroshare.org/resource/27c93a3f4ee2467691a1671442e047b8 Musolff, A., Zhan, Q., Dupas, R., Minaudo, C., Fleckenstein, J. H., Rode, M., Dehaspe, J., & Rinke, K. (2021). Spatial and Temporal Variability in Concentration-Discharge Relationships at the Event Scale. Water Resources Research, 57(10). Werner, B. J., A. Musolff, O. J. Lechtenfeld, G. H. de Rooij, M. R. Oosterwoud, and J. H. Fleckenstein (2019), High-frequency measurements explain quantity and quality of dissolved organic carbon mobilization in a headwater catchment, Biogeosciences, 16(22), 4497-4516. Werner, B. J., Lechtenfeld, O. J., Musolff, A., de Rooij, G. H., Yang, J., Grundling, R., Werban, U., & Fleckenstein, J. H. (2021). Small-scale topography explains patterns and dynamics of dissolved organic carbon exports from the riparian zone of a temperate, forested catchment. Hydrology and Earth System Sciences, 25(12), 6067-6086.
f
Data from: Objective Bayesian Survival Analysis Using Shape Mixtures of...
tandf.figshare.com
figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Catalina A. Vallejos; Mark F. J. Steel (2023). Objective Bayesian Survival Analysis Using Shape Mixtures of Log-Normal Distributions [Dataset]. http://doi.org/10.6084/m9.figshare.1473746.v3
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1473746.v3
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
Catalina A. Vallejos; Mark F. J. Steel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Survival models such as the Weibull or log-normal lead to inference that is not robust to the presence of outliers. They also assume that all heterogeneity between individuals can be modeled through covariates. This article considers the use of infinite mixtures of lifetime distributions as a solution for these two issues. This can be interpreted as the introduction of a random effect in the survival distribution. We introduce the family of shape mixtures of log-normal distributions, which covers a wide range of density and hazard functions. Bayesian inference under nonsubjective priors based on the Jeffreys’ rule is examined and conditions for posterior propriety are established. The existence of the posterior distribution on the basis of a sample of point observations is not always guaranteed and a solution through set observations is implemented. In addition, we propose a method for outlier detection based on the mixture structure. A simulation study illustrates the performance of our methods under different scenarios and an application to a real dataset is provided. Supplementary materials for the article, which include R code, are available online.
f
Goodness-of-fit filtering in classical metric multidimensional scaling with...
tandf.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Graffelman (2023). Goodness-of-fit filtering in classical metric multidimensional scaling with large datasets [Dataset]. http://doi.org/10.6084/m9.figshare.11389830.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.11389830.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
Jan Graffelman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Metric multidimensional scaling (MDS) is a widely used multivariate method with applications in almost all scientific disciplines. Eigenvalues obtained in the analysis are usually reported in order to calculate the overall goodness-of-fit of the distance matrix. In this paper, we refine MDS goodness-of-fit calculations, proposing additional point and pairwise goodness-of-fit statistics that can be used to filter poorly represented observations in MDS maps. The proposed statistics are especially relevant for large data sets that contain outliers, with typically many poorly fitted observations, and are helpful for improving MDS output and emphasizing the most important features of the dataset. Several goodness-of-fit statistics are considered, and both Euclidean and non-Euclidean distance matrices are considered. Some examples with data from demographic, genetic and geographic studies are shown.
f
Predicted class labels for the outlier countries removed from the dataset.
plos.figshare.com
figshare.com
xls
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordan J. Bird; Chloe M. Barnes; Cristiano Premebida; Anikó Ekárt; Diego R. Faria (2023). Predicted class labels for the outlier countries removed from the dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0241332.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0241332.t001
Dataset updated
Jun 7, 2023
Dataset provided by
PLOS ONE
Authors
Jordan J. Bird; Chloe M. Barnes; Cristiano Premebida; Anikó Ekárt; Diego R. Faria
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Observations show that high inability to test should be considered primarily, since it is often coupled with supposed ‘low risk’ of the other two classes w.r.t tests and deaths reported per million population.
f
Data from: Nonlinear regression models for heterogeneous data with massive...
tandf.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yoonsuh Jung (2023). Nonlinear regression models for heterogeneous data with massive outliers [Dataset]. http://doi.org/10.6084/m9.figshare.7398524.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7398524.v1
Dataset updated
May 31, 2023
Dataset provided by
Taylor & Francis
Authors
Yoonsuh Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The income or expenditure-related data sets are often nonlinear, heteroscedastic, skewed even after the transformation, and contain numerous outliers. We propose a class of robust nonlinear models that treat outlying observations effectively without removing them. For this purpose, case-specific parameters and a related penalty are employed to detect and modify the outliers systematically. We show how the existing nonlinear models such as smoothing splines and generalized additive models can be robustified by the case-specific parameters. Next, we extend the proposed methods to the heterogeneous models by incorporating unequal weights. The details of estimating the weights are provided. Two real data sets and simulated data sets show the potential of the proposed methods when the nature of the data is nonlinear with outlying observations.
MODIS/Terra Land Surface Temperature/3-Band Emissivity Daily L3 Global 1km...
data.nasa.gov
Updated Jun 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). MODIS/Terra Land Surface Temperature/3-Band Emissivity Daily L3 Global 1km SIN Grid Day V006 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/modis-terra-land-surface-temperature-3-band-emissivity-daily-l3-global-1km-sin-grid-day-v0
Explore at:
Dataset updated
Jun 12, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
The MOD21A1D Version 6 data product was decommissioned on July 31, 2023. Users are encouraged to use the MOD21A1D Version 6.1 data product.A new suite of Moderate Resolution Imaging Spectroradiometer (MODIS) Land Surface Temperature and Emissivity (LST&E) products are available in Collection 6. The MOD21 Land Surface Temperature (LST) algorithm differs from the algorithm of the MOD11 LST products, in that the MOD21 algorithm is based on the ASTER Temperature/Emissivity Separation (TES) technique, whereas the MOD11 uses the split-window technique. The MOD21 TES algorithm uses a physics-based algorithm to dynamically retrieve both the LST and spectral emissivity simultaneously from the MODIS thermal infrared bands 29, 31, and 32. The TES algorithm is combined with an improved Water Vapor Scaling (WVS) atmospheric correction scheme to stabilize the retrieval during very warm and humid conditions. The MOD21A1D dataset is produced daily from daytime Level 2 Gridded (L2G) intermediate LST products. The L2G process maps the daily MOD21 swath granules onto a sinusoidal MODIS grid and stores all observations falling over a gridded cell for a given day. The MOD21A1 algorithm sorts through these observations for each cell and estimates the final LST value as an average from all observations that are cloud free and have good LST&E accuracies. The daytime average is weighted by the observation coverage for that cell. Only observations having an observation coverage greater than a 15% threshold are considered. The MOD21A1D product contains seven Science Datasets (SDS), which include the calculated LST as well as quality control, the three emissivity bands, view zenith angle, and time of observation. MOD21A1D products are available two months after acquisition due to latency of data inputs. Additional details regarding the methodology used to create this Level 3 (L3) product are available in the Algorithm Theoretical Basis Document (ATBD).Known Issues Forward processing of Terra MODIS LST&E Version 6 data products was discontinued on December 31, 2005. Users are encouraged to use the MOD21A1D Version 6.1 data product. Users of MODIS LST products may notice an increase in occurrences of extreme high temperature outliers in the unfiltered MxD21 Version 6 and 6.1 products compared to the heritage MxD11 LST products. This can occur especially over desert regions like the Sahara where undetected cloud and dust can negatively impact both the MxD21 and MxD11 retrieval algorithms. * In the MxD11 LST products, these contaminated pixels are flagged in the algorithm and set to fill values in the output products based on differences in the band 32 and band 31 radiances used in the generalized split window algorithm. In the MxD21 LST products, values for the contaminated pixels are retained in the output products (and may result in overestimated temperatures), and users need to apply Quality Control (QC) filtering and other error analyses for filtering out bad values. High temperature outlier thresholds are not employed in MxD21 since it would potentially remove naturally occurring hot surface targets such as fires and lava flows. High atmospheric aerosol optical depth (AOD) caused by vast dust outbreaks in the Sahara and other deserts highlighted in the example documentation are the primary reason for high outlier surface temperature values (and corresponding low emissivity values) in the MxD21 LST products. Future versions of the MxD21 product will include a dust flag from the MODIS aerosol product and/or brightness temperature look up tables to filter out contaminated dust pixels. It should be noted that in the MxD11B day/night algorithm products, more advanced cloud filtering is employed in the multi-day products based on a temporal analysis of historical LST over cloudy areas. This may result in more stringent filtering of dust contaminated pixels in these products. * In order to mitigate the impact of dust in the MxD21 V6 and 6.1 products, the science team recommends using a combination of the existing QC bits, emissivity values, and estimated product errors, to confidently remove bad pixels from analysis. For more details, refer to this dust and cloud contamination example documentation. For complete information about known issues please refer to the MODIS/VIIRS Land Quality Assessment website.Improvements/Changes from Previous Versions* New product for MODIS Version 6.
f
Summary of findings.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated May 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghayath Janoudi; Mara Uzun (Rada); Deshayne B. Fell; Joel G. Ray; Angel M. Foster; Randy Giffen; Tammy Clifford; Mark C. Walker (2024). Summary of findings. [Dataset]. http://doi.org/10.1371/journal.pdig.0000515.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pdig.0000515.t003
Dataset updated
May 22, 2024
Dataset provided by
PLOS Digital Health
Authors
Ghayath Janoudi; Mara Uzun (Rada); Deshayne B. Fell; Joel G. Ray; Angel M. Foster; Randy Giffen; Tammy Clifford; Mark C. Walker
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Clinical discoveries largely depend on dedicated clinicians and scientists to identify and pursue unique and unusual clinical encounters with patients and communicate these through case reports and case series. This process has remained essentially unchanged throughout the history of modern medicine. However, these traditional methods are inefficient, especially considering the modern-day availability of health-related data and the sophistication of computer processing. Outlier analysis has been used in various fields to uncover unique observations, including fraud detection in finance and quality control in manufacturing. We propose that clinical discovery can be formulated as an outlier problem within an augmented intelligence framework to be implemented on any health-related data. Such an augmented intelligence approach would accelerate the identification and pursuit of clinical discoveries, advancing our medical knowledge and uncovering new therapies and management approaches. We define clinical discoveries as contextual outliers measured through an information-based approach and with a novelty-based root cause. Our augmented intelligence framework has five steps: define a patient population with a desired clinical outcome, build a predictive model, identify outliers through appropriate measures, investigate outliers through domain content experts, and generate scientific hypotheses. Recognizing that the field of obstetrics can particularly benefit from this approach, as it is traditionally neglected in commercial research, we conducted a systematic review to explore how outlier analysis is implemented in obstetric research. We identified two obstetrics-related studies that assessed outliers at an aggregate level for purposes outside of clinical discovery. Our findings indicate that using outlier analysis in clinical research in obstetrics and clinical research, in general, requires further development.
n
NASA Earthdata
earthdata.nasa.gov
Updated Jun 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LPCLOUD (2023). NASA Earthdata [Dataset]. http://doi.org/10.5067/VIIRS/VNP21A1N.002
Explore at:
Unique identifier
https://doi.org/10.5067/VIIRS/VNP21A1N.002
Dataset updated
Jun 29, 2023
Dataset authored and provided by
LPCLOUD
Description
The NASA/NOAA Suomi National Polar-orbiting Partnership (Suomi NPP) Visible Infrared Imaging Radiometer Suite (VIIRS) Land Surface Temperature and Emissivity (LST&E) Night Version 2 product (VNP21A1N) is compiled daily from nighttime Level 2 Gridded (L2G) intermediate products.

The L2G process maps the daily VNP21 swath granules onto a sinusoidal MODIS grid and stores all observations overlapping a gridded cell for a given night. The VNP21A1 algorithm sorts through all these observations for each cell and estimates the final LST value as an average from all cloud-free observations that have good LST accuracies. The nighttime average is weighted by the observation coverage for that cell. Only observations having observation coverage more than a certain threshold (15%) are considered for this averaging. The 1 kilometer dataset is derived through resampling the native 750 meter VIIRS resolution in the input product.

The VNP21A1N product is developed synergistically with the Moderate Resolution Imaging Spectroradiometer (MODIS) LST&E Version 6.1 product (MOD21A1N) using the same input atmospheric products and algorithmic approach. The overall objective for NASA VIIRS products is to ensure the algorithms and products are compatible with the MODIS Terra and Aqua algorithms to promote the continuity of the Earth Observation System (EOS) mission. Additional details regarding the method used to create this Level 3 (L3) product are available in the Algorithm Theoretical Basis Document (ATBD).

The VNP21A1N product contains seven Science Datasets (SDS): LST, quality control, emissivity for bands M14, M15, and M16, view zenith angle, and time of observation. A low-resolution browse image for LST is also available for each VNP21A1N granule.

Known Issues * Users of VIIRS and MODIS LST products may notice an increase in occurrences of extreme high temperature outliers in the unfiltered VNP21 and MxD21 products compared to the heritage MxD11 LST products. This can occur especially over desert regions like the Sahara where undetected cloud and dust can negatively impact MxD11, MxD21, and VNP21 retrieval algorithms. * In the MxD11 LST products, these contaminated pixels are flagged in the algorithm and set to fill values in the output products based on differences in the band 32 and band 31 radiances used in the generalized split window algorithm. In the VNP21 and MxD21 LST products, values for the contaminated pixels are retained in the output products (and may result in overestimated temperatures), and users need to apply Quality Control (QC) filtering and other error analyses for filtering out bad values. High temperature outlier thresholds are not employed in VNP21 and MxD21 since it would potentially remove naturally occurring hot surface targets such as fires and lava flows. * High atmospheric aerosol optical depth (AOD) caused by vast dust outbreaks in the Sahara and other deserts highlighted in the example documentation are the primary reason for high outlier surface temperature values (and corresponding low emissivity values) in the VNP21 and MxD21 LST products. Future versions of the VNP21 and MxD21 products will include a dust flag from the MODIS aerosol product and/or brightness temperature look up tables to filter out contaminated dust pixels. It should be noted that in the MxD11B day/night algorithm products, more advanced cloud filtering is employed in the multi-day products based on a temporal analysis of historical LST over cloudy areas. This may result in more stringent filtering of dust contaminated pixels in these products. * To mitigate the impact of dust in the VNP21 and MxD21 products, the science team recommends using a combination of the existing QC bits, emissivity values, and estimated product errors, to confidently remove bad pixels from analysis. * For complete information about known issues please refer to the MODIS/VIIRS Land Quality Assessment website.

Improvements/Changes from Previous Versions * Improved calibration algorithm and coefficients for entire Suomi NPP mission. * Improved geolocation accuracy and applied updates to fix outliers around maneuver periods. * Corrected the aerosol quantity flag (low, average, high) mainly over brighter surfaces in the mid- to high-latitudes such as desert and tropical vegetation areas. This has an impact on the retrieval of other downstream data products such as VNP13 Vegetation Indices and VNP43 Bidirectional Reflectance Distribution Function (BRDF)/Albedo. * Improved cloud mask input product for corrections along coastlines and artifacts from use of coarse resolution climatology data. * Replaced the land/water mask input product with the eight-class land/water mask from the VNP03 geolocation product that better aligns with MODIS. * Replaced MERRA2 inputs with GEOS5. * Included inland water body pixels to allow for LST retrieval over these areas. * Introduced daily, 8-day, and monthly LST CMG products. * More details can be found in this VIIRS Land V2 Changes document.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dongjun You; Michael Hunter; Meng Chen; Sy-Miin Chow (2024). A Diagnostic Procedure for Detecting Outliers in Linear State–Space Models [Dataset]. http://doi.org/10.6084/m9.figshare.12162075.v1

Data from: A Diagnostic Procedure for Detecting Outliers in Linear State–Space Models

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.12162075.v1

Dataset updated

Feb 9, 2024

Dataset provided by

Taylor & Francis

Authors

Dongjun You; Michael Hunter; Meng Chen; Sy-Miin Chow

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Outliers can be more problematic in longitudinal data than in independent observations due to the correlated nature of such data. It is common practice to discard outliers as they are typically regarded as a nuisance or an aberration in the data. However, outliers can also convey meaningful information concerning potential model misspecification, and ways to modify and improve the model. Moreover, outliers that occur among the latent variables (innovative outliers) have distinct characteristics compared to those impacting the observed variables (additive outliers), and are best evaluated with different test statistics and detection procedures. We demonstrate and evaluate the performance of an outlier detection approach for multi-subject state-space models in a Monte Carlo simulation study, with corresponding adaptations to improve power and reduce false detection rates. Furthermore, we demonstrate the empirical utility of the proposed approach using data from an ecological momentary assessment study of emotion regulation together with an open-source software implementation of the procedures.

Clear search

Close search

Google apps

Main menu

Data from: A Diagnostic Procedure for Detecting Outliers in Linear...

Controlled Anomalies Time Series (CATS) Dataset

Data from: Outlier detection in cylindrical data based on Mahalanobis...

Compilation of historic and ongoing N2O observations across Europe - Dataset...

Collection of historic and ongoing N2O observations across Europe (release...

MODIS/Terra Land Surface Temperature/3-Band Emissivity Daily L3 Global 1km...

Membership and rotational data for clusters - Dataset - B2FIND

SMARTS observations of eps Eridani - Dataset - B2FIND

High frequency dataset for event-scale concentration-discharge analysis in a...

Data from: Objective Bayesian Survival Analysis Using Shape Mixtures of...

Goodness-of-fit filtering in classical metric multidimensional scaling with...

Predicted class labels for the outlier countries removed from the dataset.

Data from: Nonlinear regression models for heterogeneous data with massive...

MODIS/Terra Land Surface Temperature/3-Band Emissivity Daily L3 Global 1km...

Summary of findings.

NASA Earthdata

Data from: A Diagnostic Procedure for Detecting Outliers in Linear State–Space Models