14 datasets found

d
Supporting data for \"A Standard Operating Procedure for Outlier Removal in...
search.dataone.org
dataverse.no
+1more
Updated Jul 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Holsbø, Einar (2024). Supporting data for \"A Standard Operating Procedure for Outlier Removal in Large-Sample Epidemiological Transcriptomics Datasets\" [Dataset]. http://doi.org/10.18710/FGVLKS
Explore at:
Unique identifier
https://doi.org/10.18710/FGVLKS
Dataset updated
Jul 29, 2024
Dataset provided by
DataverseNO
Authors
Holsbø, Einar
Description
This dataset is example data from the Norwegian Women and Cancer study. It is supporting information to our article "A Standard Operating Procedure for Outlier Removal in Large-Sample Epidemiological Transcriptomics Datasets." (In submission) The bulk of the data comes from measuring gene expression in blood samples from the Norwegian Women and Cancer study (NOWAC) on Illumina Whole-Genome Gene Expression Bead Chips, HumanHT-12 v4. Please see README.txt for details
Z
Dataset on the Human Body as a Signal Propagation Medium
data.niaid.nih.gov
zenodo.org
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataset on the Human Body as a Signal Propagation Medium [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8214496
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
V. Abolins
A. Elsts
A. Sevcenko
V. Aristovs
V. Medvedevs
J. Ormanis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview: This is a large-scale dataset with impedance and signal loss data recorded on volunteer test subjects using low-voltage alternate current sine-shaped signals. The signal frequencies are from 50 kHz to 20 MHz.

Applications: The intention of this dataset is to allow to investigate the human body as a signal propagation medium, and capture information related to how the properties of the human body (age, sex, composition etc.), the measurement locations, and the signal frequencies impact the signal loss over the human body.

Overview statistics:

Number of subjects: 30

Number of transmitter locations: 6

Number of receiver locations: 6

Number of measurement frequencies: 19

Input voltage: 1 V

Load resistance: 50 ohm and 1 megaohm

Measurement group statistics:

Height: 174.10 (7.15)

Weight: 72.85 (16.26)

BMI: 23.94 (4.70)

Body fat %: 21.53 (7.55)

Age group: 29.00 (11.25)

Male/female ratio: 50%

Included files:

experiment_protocol_description.docx - protocol used in the experiments

electrode_placement_schematic.png - schematic of placement locations

electrode_placement_photo.jpg - visualization on the experiment, on a volunteer subject

RawData - the full measurement results and experiment info sheets

all_measurements.csv - the most important results extracted to .csv

all_measurements_filtered.csv - same, but after z-score filtering

all_measurements_by_freq.csv - the most important results extracted to .csv, single frequency per row

all_measurements_by_freq_filtered.csv - same, but after z-score filtering

summary_of_subjects.csv - key statistics on the subjects from the experiment info sheets

process_json_files.py - script that creates .csv from the raw data

filter_results.py - outlier removal based on z-score

plot_sample_curves.py - visualization of a randomly selected measurement result subset

plot_measurement_group.py - visualization of the measurement group

CSV file columns:

subject_id - participant's random unique ID

experiment_id - measurement session's number for the participant

height - participant's height, cm

weight - participant's weight, kg

BMI - body mass index, computed from the valued above

body_fat_% - body fat composition, as measured by bioimpedance scales

age_group - age rounded to 10 years, e.g. 20, 30, 40 etc.

male - 1 if male, 0 if female

tx_point - transmitter point number

rx_point - receiver point number

distance - distance, in relative units, between the tx and rx points. Not scaled in terms of participant's height and limb lengths!

tx_point_fat_level - transmitter point location's average fat content metric. Not scaled for each participant individually.

rx_point_fat_level - receiver point location's average fat content metric. Not scaled for each participant individually.

total_fat_level - sum of rx and tx fat levels

bias - constant term to simplify data analytics, always equal to 1.0

CSV file columns, frequency-specific:

tx_abs_Z_... - transmitter-side impedance, as computed by the process_json_files.py script from the voltage drop

rx_gain_50_f_... - experimentally measured gain on the receiver, in dB, using 50 ohm load impedance

rx_gain_1M_f_... - experimentally measured gain on the receiver, in dB, using 1 megaohm load impedance

Acknowledgments: The dataset collection was funded by the Latvian Council of Science, project “Body-Coupled Communication for Body Area Networks”, project No. lzp-2020/1-0358.

References: For a more detailed information, see this article: J. Ormanis, V. Medvedevs, A. Sevcenko, V. Aristovs, V. Abolins, and A. Elsts. Dataset on the Human Body as a Signal Propagation Medium for Body Coupled Communication. Submitted to Elsevier Data in Brief, 2023.

Contact information: info@edi.lv
U
Stream water-quality summary statistics and outliers, streamwater load...
data.usgs.gov
search.dataone.org
+1more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brent Aulenbach; John Joiner, Stream water-quality summary statistics and outliers, streamwater load models and yield estimates, and peak flow modeling parameters for 13 watersheds in Gwinnett County, Georgia [Dataset]. http://doi.org/10.5066/F7639MXG
Explore at:
Unique identifier
https://doi.org/10.5066/F7639MXG
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Brent Aulenbach; John Joiner
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Mar 12, 2001 - Sep 30, 2015
Area covered
Gwinnett County, Georgia
Description
Data release includes the following five data tables: (1) water-quality constituent outliers that were removed from the calibration of regression models used to estimate streamwater solute loads, (2) parameters used to model peak streamflow recurrence intervals, (3) models used to estimate streamwater constituent loads, (4) statistical summaries of water-quality observations, and (5) estimated annual streamwater constituent yields. An associated metadata file is included for each of the five data tables.
Data from: Outlier classification using autoencoders: application for...
osti.gov
Updated Jun 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center (2021). Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas [Dataset]. http://doi.org/10.7910/DVN/SKEHRJ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SKEHRJ
Dataset updated
Jun 2, 2021
Dataset provided by
Office of Sciencehttp://www.er.doe.gov/
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center
Description
Understanding the statistics of fluctuation driven flows in the boundary layer of magnetically confined plasmas is desired to accurately model the lifetime of the vacuum vessel components. Mirror Langmuir probes (MLPs) are a novel diagnostic that uniquely allow us to sample the plasma parameters on a time scale shorter than the characteristic time scale of their fluctuations. Sudden large-amplitude fluctuations in the plasma degrade the precision and accuracy of the plasma parameters reported by MLPs for cases in which the probe bias range is of insufficient amplitude. While some data samples can readily be classified as valid and invalid, we find that such a classification may be ambiguous for up to 40% of data sampled for the plasma parameters and bias voltages considered in this study. In this contribution, we employ an autoencoder (AE) to learn a low-dimensional representation of valid data samples. By definition, the coordinates in this space are the features that mostly characterize valid data. Ambiguous data samples are classified in this space using standard classifiers for vectorial data. In this way, we avoid defining complicated threshold rules to identify outliers, which require strong assumptions and introduce biases in the analysis. By removing the outliers that are identified in the latent low-dimensional space of the AE, we find that the average conductive and convective radial heat fluxes are between approximately 5% and 15% lower as when removing outliers identified by threshold values. For contributions to the radial heat flux due to triple correlations, the difference is up to 40%.
Z
Identification of Performance Changes at Code Level (Measurement...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Aug 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous for Reviewing (2022). Identification of Performance Changes at Code Level (Measurement Configuration Dataset) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6300863
Explore at:
Dataset updated
Aug 8, 2022
Dataset authored and provided by
Anonymous for Reviewing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Measurement Configuration Dataset

This is the anonymous reviewing version; the source code repository will be added after the review.

This dataset provides reproduction data for performance measurement configuration at source code level in Java. The measurement data can be obtained using the precision-experiments repository https://anonymous.4open.science/r/precision-experiments-C613/ (Examining Different Repetition Counts) yourself. These data conatained here are the data we obtained from execution on i7-4770 CPU @ 3.40GHz.

The analysis was tested on Ubuntu 20.04 and gnuplot 5.2.8. It will not work with older gnuplot versions.

To execute the analysis, extract the data by

tar -xvf basic-parameter-comparison.tar tar -xvf parallel-sequential-comparison.tar

and afterwards build the precision-experiments repo and execute the analysis by

cd precision-experiments/precision-analysis/ ../gradlew fatJar cd scripts/configuration-analysis/ ./executeCompleteAnalysis.sh ../../../../basic-parameter-comparison ../../../../parallel-sequential-comparison

Afterwards, the following files will be present:

precision-experiments/precision-analysis/scripts/configuration-analysis/repetitionHeatmaps/heatmap_all_en.pdf (Heatmaps for different repetition counts)

precision-experiments/precision-analysis/scripts/configuration-analysis/repetitionHeatmaps/heatmap_outlierRemoval_en.pdf (Heatmap with and without outlier removal for 1000 repetitions)

precision-experiments/precision-analysis/scripts/configuration-analysis/histogram_outliers_en.pdf (Histogram of the outliers)

precision-experiments/precision-analysis/scripts/configuration-analysis/heatmap_parallel_en.pdf (Heatmap with sequential and parallel execution)
U
11: Streamwater sample constituent concentration outliers from 15 watersheds...
data.usgs.gov
catalog.data.gov
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brent Aulenbach; Joshua Henley; Kristina Hopkins, 11: Streamwater sample constituent concentration outliers from 15 watersheds in Gwinnett County, Georgia for water years 2003-2020 [Dataset]. http://doi.org/10.5066/P9G8HZTY
Explore at:
Unique identifier
https://doi.org/10.5066/P9G8HZTY
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Brent Aulenbach; Joshua Henley; Kristina Hopkins
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Oct 10, 2002 - Sep 29, 2020
Area covered
Gwinnett County, Georgia
Description
This dataset contains a list of outlier sample concentrations identified for 17 water quality constituents from streamwater sample collected at 15 study watersheds in Gwinnett County, Georgia for water years 2003 to 2020. The 17 water quality constituents are: biochemical oxygen demand (BOD), chemical oxygen demand (COD), total suspended solids (TSS), suspended sediment concentration (SSC), total nitrogen (TN), total nitrate plus nitrite (NO3NO2), total ammonia plus organic nitrogen (TKN), dissolved ammonia (NH3), total phosphorus (TP), dissolved phosphorus (DP), total organic carbon (TOC), total calcium (Ca), total magnesium (Mg), total copper (TCu), total lead (TPb), total zinc (TZn), and total dissolved solids (TDS). 885 outlier concentrations were identified. Outliers were excluded from model calibration datasets used to estimate streamwater constituent loads for 12 of these constituents. Outlier concentrations were removed because they had a high influence on the model fits o ...
Data from: Fast robust SUR with economical and actuarial applications
search.datacite.org
wiley.figshare.com
Updated Jul 14, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mia Hubert; Tim Verdonck (2016). Data from: Fast robust SUR with economical and actuarial applications [Dataset]. http://doi.org/10.6084/m9.figshare.3408073
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.3408073
Dataset updated
Jul 14, 2016
Dataset provided by
DataCitehttps://www.datacite.org/
Wiley
Authors
Mia Hubert; Tim Verdonck
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The seemingly unrelated regression (SUR) model is a generalization of a linear regression model consisting of more than one equation, where the error terms of these equations are contemporaneously correlated. The standard Feasible Generalized Linear Squares (FGLS) estimator is efficient as it takes into account the covariance structure of the errors, but it is also very sensitive to outliers. The robust SUR estimator of Bilodeau and Duchesne (Canadian Journal of Statistics, 28:277-288, 2000) can accommodate outliers, but it is hard to compute. First we propose a fast algorithm, FastSUR, for its computation and show its good performance in a simulation study. We then provide diagnostics for outlier detection and illustrate them on a real data set from economics. Next we apply our FastSUR algorithm in the framework of stochastic loss reserving for general insurance. We focus on the General Multivariate Chain Ladder (GMCL) model that employs SUR to estimate its parameters. Consequently, this multivariate stochastic reserving method takes into account the contemporaneous correlations among run-off triangles and allows structural connections between these triangles. We plug in our FastSUR algorithm into the GMCL model to obtain a robust version.
U
Input data for chloride-specific conductance regression models
data.usgs.gov
catalog.data.gov
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosemary Fanelli; Andrew Sekellick; Joel Moore, Input data for chloride-specific conductance regression models [Dataset]. http://doi.org/10.5066/P9YN2QST
Explore at:
Unique identifier
https://doi.org/10.5066/P9YN2QST
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Rosemary Fanelli; Andrew Sekellick; Joel Moore
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Sep 17, 1953 - Sep 28, 2018
Description
This data set includes input data for the development of regression models to predict chloride from specific conductance (SC) data at 56 U. S. Geological Survey water quality monitoring stations in the eastern United States. Each site has 20 or more simultaneous observations of SC and chloride. Data were downloaded from the National Water Information System (NWIS) using the R package dataRetrieval. Datasets for each site were evaluated and outliers were removed prior to the development of the regression model. This file contains only the final input dataset for the regression models. Please refer to Moore and others (in review) for more details. Moore, J., R. Fanelli, and A. Sekellick. In review. High-frequency data reveal deicing salts drive elevated conductivity and chloride along with pervasive and frequent exceedances of the EPA aquatic life criteria for chloride in urban streams. Submitted to Environmental Science and Technology.
I
CBP Water Quality Monitoring Subset (1984-2018), CB8 1E
data.ioos.us
erddap.maracoos.org
+1more
erddap +2
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MARACOOS (2025). CBP Water Quality Monitoring Subset (1984-2018), CB8 1E [Dataset]. https://data.ioos.us/dataset/cbp-water-quality-monitoring-subset-1984-2018-cb8-1e
Explore at:
erddap, erddap-tabledap, opendapAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
MARACOOS
Description
This product was developed as part of the project supported by the grant from and the National Oceanic and Atmospheric Administration’s Ocean Acidification Program under award NA18OAR0170430 to the Virginia Institute of Marine Science. The data product consists of water quality data for tidal 98 stations for 1984–2018. The source data used to generate this product were downloaded from the Chesapeake Bay Program’s (CBP) data hub. Out of the total of 255 monitoring stations in the Tidal Monitoring Program, we selected 98 with the long monitoring record (30 years or longer). The following variables were downloaded from the data hub at the native temporal and vertical resolution (between one and four cruises per month and approximately 10 depth levels sampled between 0 and 37 m) for 1984–2018: water temperature (T), salinity (S), pH, total alkalinity (TA), dissolved oxygen (DO) , and chlorophyll (Chl). All pH data prior to 1998 were removed because of the data quality concerns (Herrmann et al., 2020). Briefly, we found a dramatic difference in long-term trends between stations measured by institutions in the state of Virginia and stations measured by the state of Maryland, particularly from late spring to early fall. The boundary between the station groups runs east–west within the mesohaline portion of the bay, where the Potomac River estuary intersects the mainstem bay. The boundary separates strong negative linear trends to the south (Virginia stations) from neutral and weakly positive linear trends to the north (Maryland stations). For all variables, data entries marked with CBP’s “Problem” and “Qualifier” flags were removed. Additionally, all variables were scanned for extreme outliers: for each variable, data from all stations, depths, and times were combined into a single composite sample for which the 75th and 25th percentiles (i.e., the upper and lower quantiles) and the interquartile range (the difference between the upper and lower quantiles) were calculated. Extreme outliers were defined as the values falling outside of a certain number (censoring criterion) of interquartile ranges from the upper and lower quantiles.

A-TWAIN Physical Oceanography Mooring Data 2021-2022

data.npolar.no

bin, nc, pdf

Updated Oct 29, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Renner, Angelika H. H. (angelika.renner@hi.no); Sundfjord, Arild (arild.sundfjord@npolar.no); Foss, Øyvind (oyvind.foss@npolar.no); Renner, Angelika H. H. (angelika.renner@hi.no); Sundfjord, Arild (arild.sundfjord@npolar.no); Foss, Øyvind (oyvind.foss@npolar.no) (2024). A-TWAIN Physical Oceanography Mooring Data 2021-2022 [Dataset]. http://doi.org/10.21334/npolar.2024.86ec6869

Explore at:

bin, nc, pdfAvailable download formats

Unique identifier

https://doi.org/10.21334/npolar.2024.86ec6869

Dataset updated

Oct 29, 2024

Dataset provided by

Norwegian Polar Data Centre

Authors

License

http://spdx.org/licenses/CC0-1.0http://spdx.org/licenses/CC0-1.0

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Time period covered

Nov 9, 2021 - Oct 6, 2022

Area covered

Description

A-TWAIN Physical Oceanography mooring data 2021-2022

A-TWAIN (Long-term variability and trends in the Atlantic Water inflow region) was established to gain understanding on how the inflowing current system is distributed at different depths along the continental slope, how it responds to local, short-lived atmospheric changes, and how it varies on seasonal and interannual timescales.

Overview

As part of A-TWAIN, three moorings were redeployed near the continental slope of the Nansen Basin in the Arctic Ocean, near 31°E north of the Barents Sea. The moorings were operational between November 2021 and October 2022. All moorings have previously been deployed in the same respective locations; these data constitute the 2021-2022 continuation of the A-TWAIN mooring time series.

AT800-7* and AT200-6 moorings were instrumented mooring lines extending from the bottom anchor to a sub-surface buoy, while AT500-2 was a bottom lander. CTD and ADCP data from the moorings will be made available here; other datasets from these moorings will be published elsewhere. Processed data will be added here as they become available.

* "AT800-7" denotes the 7th deployment of the AT800 mooring.

Table: Details of the mooring deployents

Mooring	Type	Bottom depth	Latitude	Longitude	Deployment date	Recovery date	Data status
AT200-6	Instrumented line	205 m	81.4105	31.2433	09.11.21	06.10.22	CTD data published
AT500-2	Bottom lander	488 m	81.4577	31.0753	09.11.21	04.10.22
AT800-7	Instrumented line	889 m	81.5501	30.8777	09.11.21	04.10.22

"https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/atwain_moorings_21_22/at200/atwain_map.png"> https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/atwain_moorings_21_22/at200/atwain_map.png" width="900" alt="ATWAIN map">

A-TWAIN mooring locations showing IBCAO v4 bathymetry.

All three moorings were deployed in November 2021 during the joint Nansen Legacy and A-TWAIN/SIOS-InfraNor mooring service cruise (KPH20217123), and recovered in October 2022 during the Nansen Legacy Mooring Service Cruise (KPH2022712).

Data details

Processed data are made available as one NetCDF file per instrument. Raw instrument data are also available. Details of the processing of the respective datasets are shown below (click to access the dropdown content).

AT200-6 CTD data

Instrument	Median depth	Serial number	Sampling frequency	File name
SBE16plus	46 m	50241	45 min	`AT200_2021_2022_SBE16plus_50241_pres_temp_sal_46m.nc`
SBE37SMP	59 m	20773	15 min	`AT200_2021_2022_SBE37SMP_20773_pres_temp_sal_59m.nc`
SBE37SM	113 m	15252	15 min	`AT200_2021_2022_SBE37SM_15252_pres_temp_sal_113m.nc`
SBE37SM	191 m	9293	15 min	`AT200_2021_2022_SBE37SM_9293_pres_temp_sal_191m.nc`

AT200-6 data processing

Data processing

AT200 CTD data were processed to .cnv using SBEDataProcessing software. Additional processing was done in Python using the kval library (v0.0.2-beta, this commit).

Processing steps as well as a python script for reproducing the post-processing from .cnv can be found in the PROCESSING variable of each file.

All records were chopped to the time range 2021-11-09 20:00 - 2022-10-06 07:30 in order to remove data from recovery/deployment and deck times.

Salinity outlier editing

After visual inspection, no editing was applied to temperature and pressure.

Salinity has been lightly edited in order to remove noise and outliers (see PSAL variable attributes for details). The identification of outliers is complicated by the large hydrographic variability in this location, reflecting sharp lateral gradients near the continental slope in combination with an energetic background environment and relatively strong tides. The processing has therefore been done using a relatively light approach, described below. This editing may or may not be appropriate or sufficient for specific research purposes. Users who want to apply their own editing are encouraged to work with unedited salinity, which can easily be obtained by reprocessing salinity from TEMP and CNDC (both of which have been left unedited).

For SBE37 instruments:

PSAL was recomputed from modified conductivity CNDC_mod and temperature TEMP_mod in order to reduce (presumably artificial) high-frequency noise:
- CNDC_mod was despiked using a 31-pt rolling window (rejecting outliers >3 SD from the median).
- A rolling 3-point median was applied to CNDC_mod and TEMP_mod.
- PSAL was recomputed from temperature, conductivity and pressure using the GSW-Python library.
  - (No filtering or editing have been applied to the fields TEMP and CNDC stored in the netCDF files.)
- PSAL was despiked using a 15-pt rolling window (rejecting outliers >3 SD from the median).
- Finally, a rolling 3-point median was applied to CNDC_mod and TEMP_mod.

For the SBE16plus instrument:

Major outliers in PSAL were removed using a threshold value of 25.
Additional outliers were removed using a 31-pt rolling window (rejecting outliers >3 SD from the median).

Validation against shipboard CTDs

Measured variables were found to agree well with post-deployment CTD profiles (from a SBE911+ on the R/V Kronprins Haakon) from the start of the record. A the end of the record, all sensors were found to agree reasonably well with a pre-recovery shipboard CTD profile with the exception of the upper instrument (SBE16plus S/N 50241). We attribute this to the profile being complex around 50 m depth at this time (region of an ~1C cold intrusion and salinity inversion on the background of a strong halocline). The temperature-salinity distribution is broadly consistent with the measurement being physically sensible, as is the salinity increase from the sensor near 46 m to the one near 59 m. Users should be aware that the SBE16plus salinity data could not be validated against other measurements.

"https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/atwain_moorings_21_22/at200/comparison_at200_pre_deployment_ctd_timeseries_profile.png"> https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/atwain_moorings_21_22/at200/comparison_at200_pre_deployment_ctd_timeseries_profile.png" width="400" alt="Temperature and Salinity profile comparison">

"https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/atwain_moorings_21_22/at200/comparison_at200_pre_deployment_ctd_timeseries_T_S.png"> https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/atwain_moorings_21_22/at200/comparison_at200_pre_deployment_ctd_timeseries_T_S.png" width="300" alt="T-S comparison">

Comparison of temperature (left) and practical salinity (middle) profiles and temperature-salinity distributions (right) between moored CTDs (colors) and shipboard CTD SBE911+ profile (black) on Oct 5 2022, the day before mooring recovery. Coloured dots indicate the moored CTD value closest to the profile timestamp, and coloured lines show values collected within ±1h of the profile timestamp.

"https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/atwain_moorings_21_22/at200/comparison_at200_pre_deployment_ctd_timeseries.png"> https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/atwain_moorings_21_22/at200/comparison_at200_pre_deployment_ctd_timeseries.png" width="400" alt="Image caption">

Comparison between moored CTDs (black/blue) and shipboard CTD (red) on Oct 5 2022, the day before mooring recovery. Blue lines highlight the moored CTD values within ±one hour of the ship CTD profile.

A post-deployment calibration CTD cast was performed after recovery of the moorings, on 09.10.22. Here, two of the SBE37 instruments (#15252 and #20733) were attached to the ship CTD rosette and submerged with resting stops at 75 m, 30 m, and 20 m. Comparing the values between the two microcats and against ship CTD suggests that these two instruments were internally consistent within approximately 0.005 psu and consistent with the ship CTD within ±0.02 psu.

"https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/atwain_moorings_21_22/at200/TEMP_CTD_comparison_microcats_on_rosette.png"> https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/atwain_moorings_21_22/at200/TEMP_CTD_comparison_microcats_on_rosette.png" width="600" alt="Temperature Comparison">

"https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/atwain_moorings_21_22/at200/PSAL_CTD_comparison_microcats_on_rosette.png"> https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/atwain_moorings_21_22/at200/PSAL_CTD_comparison_microcats_on_rosette.png" width="600" alt="Salinity Comparison">

Comparison of temperature (upper) and practical salinity (lower) values from the "calibration CTD cast" on 09.10.22 where two of the AT200-6 SBE37 instruments were mounted on the rosette and resting stops were made near 75, 30, and 20 m. Black: Shipboard CTD, Red: SBE37 #15252, Blue: SBE37 #20773. Small dots show all data points from the depth stops, triangles and

Data from: Outlier classification using autoencoders: application for...
osti.gov
Updated Jun 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bianchi, F. M.; Brunner, D.; Kube, R.; LaBombard, B. (2021). Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/1882649-outlier-classification-using-autoencoders-application-fluctuation-driven-flows-fusion-plasmas
Explore at:
Dataset updated
Jun 2, 2021
Dataset provided by
United States Department of Energyhttp://energy.gov/
Office of Sciencehttp://www.er.doe.gov/
Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States). Plasma Science and Fusion Center
Authors
Bianchi, F. M.; Brunner, D.; Kube, R.; LaBombard, B.
Description
Understanding the statistics of fluctuation driven flows in the boundary layer of magnetically confined plasmas is desired to accurately model the lifetime of the vacuum vessel components. Mirror Langmuir probes (MLPs) are a novel diagnostic that uniquely allow us to sample the plasma parameters on a time scale shorter than the characteristic time scale of their fluctuations. Sudden large-amplitude fluctuations in the plasma degrade the precision and accuracy of the plasma parameters reported by MLPs for cases in which the probe bias range is of insufficient amplitude. While some data samples can readily be classified as valid and invalid, we find that such a classification may be ambiguous for up to 40% of data sampled for the plasma parameters and bias voltages considered in this study. In this contribution, we employ an autoencoder (AE) to learn a low-dimensional representation of valid data samples. By definition, the coordinates in this space are the features that mostly characterize valid data. Ambiguous data samples are classified in this space using standard classifiers for vectorial data. In this way, we avoid defining complicated threshold rules to identify outliers, which require strong assumptions and introduce biases in the analysis. By removing the outliers that aremore » identified in the latent low-dimensional space of the AE, we find that the average conductive and convective radial heat fluxes are between approximately 5% and 15% lower as when removing outliers identified by threshold values. For contributions to the radial heat flux due to triple correlations, the difference is up to 40%.« less
Environmental data used for PCA
figshare.com
txt
Updated Jun 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Whitehead (2023). Environmental data used for PCA [Dataset]. http://doi.org/10.6084/m9.figshare.20088632.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20088632.v1
Dataset updated
Jun 17, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
James Whitehead
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset has had outliers removed and has been formatted in order to be appropriate for use in Principal Component Analysis. The raw data was provided by Moritz von der Lippe and Anne Hiller from TU Berlin, and field measurements were carried out by Lena Fiechter.
m
Data from: MQTTEEB-D: A Real-World IoT Cybersecurity Dataset for AI-Powered...
data.mendeley.com
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ABDERRAHMANE AQACHTOUL (2025). MQTTEEB-D: A Real-World IoT Cybersecurity Dataset for AI-Powered Threat Detection in MQTT Networks [Dataset]. http://doi.org/10.17632/jfttfjn6tr.1
Explore at:
Unique identifier
https://doi.org/10.17632/jfttfjn6tr.1
Dataset updated
Mar 20, 2025
Authors
ABDERRAHMANE AQACHTOUL
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset accompanies the research article on MQTTEEB-D and is intended for public use in cybersecurity research. The MQTTEEB-D dataset is a practical real-world data set for intrusion detection improvement in Message Queuing Telemetry Transport (MQTT)-based Internet of Things (IoT) networks. In contrast to already existing datasets that are constructed on simulated network traffic, MQTTEEB-D is obtained from a real-time IoT deployment at the International University of Rabat (UIR), Morocco. Using MySignals IoT health sensors, Raspberry Pi 4, and an MQTT broker server, this dataset represents the actual complexity of the active IoT communication process, which synthetic data fails to offer. To narrow the gap between simulated and real-world attack scenarios, various cyberattacks including Denial of Service (DoS), Slow DoS against Internet of Things Environments (SlowITe), Malformed Data Injection, Brute Force, and MQTT publish flooding were carried out in real-time, permitting close monitoring of network traffic anomalies. The data was captured using Python wrapper for tshark (PyShark) and organized into multiple Comma-Separated Values (CSV) files. To ensure high data quality, we performed pre-processing steps, such as outlier removal, normalization, standardization, and class balance. Several processed forms (raw, cleaned, normalized, standardized, Synthetic Minority Over-sampling Technique (SMOTE)) applied for this dataset are provided, along with detailed metadata to facilitate ease of use in cybersecurity research. This dataset provides an opportunity for researchers to develop and validate intrusion detection models in a real-world MQTT environment - a critical ingredient in Artificial Intelligence (AI)-driven cybersecurity solutions for IoT networks. The dataset will support future research IoT security and anomaly detection domains.
n
Global Ocean Data Analysis Project version 2.2019 (GLODAPv2.2019) (NCEI...
cmr.earthdata.nasa.gov
catalog.data.gov
not provided
Updated Sep 26, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Global Ocean Data Analysis Project version 2.2019 (GLODAPv2.2019) (NCEI Accession 0186803) [Dataset]. http://doi.org/10.25921/xnme-wr20
Explore at:
not provided(1425.488 KB)Available download formats
Unique identifier
https://doi.org/10.25921/xnme-wr20
Dataset updated
Sep 26, 2019
Time period covered
Jan 1, 1972 - Mar 5, 2017
Area covered
Earth
Description
This NCEI Accession consists of GLODAPv2.2019 data product composed of data from 840 scientific cruises covering the global ocean between 1972 and 2017. It includes full depth discrete bottle measurements of salinity, oxygen, nitrate, silicate, phosphate, dissolved inorganic carbon (TCO2), total alkalinity (TAlk), pH, chlorofluorocarbons (CFC-11, CFC-12, CFC-113, and CCl4), various isotopes and organic compounds. It was created by appending data from 116 cruises to GLODAPv2 (Olsen et al., 2016, NCEI Accession 0162565). The data for salinity, oxygen, nitrate, silicate, phosphate, TCO2, TAlk, pH, CFC-11, CFC-12, CFC-113, and CCl4 were subjected to primary and secondary quality control. Severe biases in these data have been corrected for, and outliers removed. However, differences in data related to any known or likely time trends or variations have not been corrected for. These data are believed to be accurate to 0.005 in salinity, 1% in oxygen, 2% in nitrate, 2% in silicate, 2% in phosphate, 4 Âµmol kg-1 in TCO2, 4 Âµmol kg-1 in TAlk, and for the halogenated transient tracers: 5%.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Holsbø, Einar (2024). Supporting data for \"A Standard Operating Procedure for Outlier Removal in Large-Sample Epidemiological Transcriptomics Datasets\" [Dataset]. http://doi.org/10.18710/FGVLKS

Supporting data for \"A Standard Operating Procedure for Outlier Removal in Large-Sample Epidemiological Transcriptomics Datasets\"

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.18710/FGVLKS

Dataset updated

Jul 29, 2024

Dataset provided by

DataverseNO

Authors

Holsbø, Einar

Description

This dataset is example data from the Norwegian Women and Cancer study. It is supporting information to our article "A Standard Operating Procedure for Outlier Removal in Large-Sample Epidemiological Transcriptomics Datasets." (In submission) The bulk of the data comes from measuring gene expression in blood samples from the Norwegian Women and Cancer study (NOWAC) on Illumina Whole-Genome Gene Expression Bead Chips, HumanHT-12 v4. Please see README.txt for details

Clear search

Close search

Google apps

Main menu

Supporting data for \"A Standard Operating Procedure for Outlier Removal in...

Dataset on the Human Body as a Signal Propagation Medium

Stream water-quality summary statistics and outliers, streamwater load...

Data from: Outlier classification using autoencoders: application for...

Identification of Performance Changes at Code Level (Measurement...

11: Streamwater sample constituent concentration outliers from 15 watersheds...

Data from: Fast robust SUR with economical and actuarial applications

Input data for chloride-specific conductance regression models

CBP Water Quality Monitoring Subset (1984-2018), CB8 1E