9 datasets found

Chemical outlier dataset
zenodo.org
bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mario Lovric; Mario Lovric (2020). Chemical outlier dataset [Dataset]. http://doi.org/10.5281/zenodo.1167835
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1167835
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mario Lovric; Mario Lovric
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The objects are numbered. The Y-variable are boiling points. Other features are structural features of molecules. In the outlier column the outliers are assigned with a value of 1.

The data is derived from a published chemical dataset on boiling point measurements [1] and from public data [2]. Features were generated by means of the RDKit Python library [3]. The dataset was infused with known outliers (~5%) based on significant structural differences, i.e. polar and non-polar molecules.

Cherqaoui D., Villemin D. Use of a Neural Network to determine the Boiling Point of Alkanes. J CHEM SOC FARADAY TRANS. 1994;90(1):97–102.

https://pubchem.ncbi.nlm.nih.gov/

RDKit: Open-source cheminformatics; http://www.rdkit.org
f
Data_Sheet_1_ExGUtils: A Python Package for Statistical Analysis With the...
frontiersin.figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carmen Moret-Tatay; Daniel Gamermann; Esperanza Navarro-Pardo; Pedro Fernández de Córdoba Castellá (2023). Data_Sheet_1_ExGUtils: A Python Package for Statistical Analysis With the ex-Gaussian Probability Density.zip [Dataset]. http://doi.org/10.3389/fpsyg.2018.00612.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2018.00612.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Carmen Moret-Tatay; Daniel Gamermann; Esperanza Navarro-Pardo; Pedro Fernández de Córdoba Castellá
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The study of reaction times and their underlying cognitive processes is an important field in Psychology. Reaction times are often modeled through the ex-Gaussian distribution, because it provides a good fit to multiple empirical data. The complexity of this distribution makes the use of computational tools an essential element. Therefore, there is a strong need for efficient and versatile computational tools for the research in this area. In this manuscript we discuss some mathematical details of the ex-Gaussian distribution and apply the ExGUtils package, a set of functions and numerical tools, programmed for python, developed for numerical analysis of data involving the ex-Gaussian probability density. In order to validate the package, we present an extensive analysis of fits obtained with it, discuss advantages and differences between the least squares and maximum likelihood methods and quantitatively evaluate the goodness of the obtained fits (which is usually an overlooked point in most literature in the area). The analysis done allows one to identify outliers in the empirical datasets and criteriously determine if there is a need for data trimming and at which points it should be done.
g
Bathymetry of the Main Pool of Lake Calumet, Cook County, Illinois, July...
gimi9.com
Updated Jul 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Bathymetry of the Main Pool of Lake Calumet, Cook County, Illinois, July 2023 | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_bathymetry-of-the-main-pool-of-lake-calumet-cook-county-illinois-july-2023/
Explore at:
Dataset updated
Jul 20, 2024
Area covered
Cook County, Lake Calumet, Illinois
Description
These data are single-beam bathymetry points compiled in comma separated values (CSV) file format, generated from a hydrographic survey of the northern portion of Lake Calumet in Cook County, Illinois. Hydrographic data were collected July 18-19, 2023, using a single-beam echosounder (SBES) integrated with a Global Navigation Satellite System (GNSS) mounted on a marine survey vessel. Surface water elevation data were collected July 18 utilizing a single-base real-time kinematic (RTK)/GNSS unit. Bathymetric data points were collected as the vessel traversed the northern portions of the lake along overlapping survey lines. The SBES internally collected and stored the depth data from the echosounder and the horizontal and vertical position data of the vessel from the GNSS in real time. Data processing required specialized computer software to export bathymetry data from the raw data files. A Python script was written to calculate the lakebed elevations and identify outliers in the dataset. These data are provided in comma separated values (CSV) format as LakeCalumet_SBES_20230718.csv. Data points are stored as a series of x (longitude), y (latitude), and z (elevation or depth) points along with variable length records specific to the data transects.
f
Additional file 2 - datasets and scripts for metabolome analysis
figshare.com
xlsx
Updated Apr 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roberta Ruggeri; Giuseppe Bee; Paolo Trevisi; Catherine Ollagnier; Federico Correa (2024). Additional file 2 - datasets and scripts for metabolome analysis [Dataset]. http://doi.org/10.6084/m9.figshare.25684509.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25684509.v1
Dataset updated
Apr 29, 2024
Dataset provided by
figshare
Authors
Roberta Ruggeri; Giuseppe Bee; Paolo Trevisi; Catherine Ollagnier; Federico Correa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For the metabolome data, all calculations and statistical analyses were performed using Python. The Shapiro-Wilk test was performed to identify the metabolites whose concentrations in the blood showed a normal distribution, and Student’s t-test was used to compare their concentrations in blood samples for the IUGR and NORM groups. Metabolites whose concentrations did not show a normal distribution were compared between the two groups using the non-parametric Mann–Whitney test. The Benjamini–Hochberg correction was applied in both cases to account for the risk I inflation associated with multiple comparisons. Before being subjected to unsupervised and supervised algorithms, the concentration of each metabolite was normalised and centred. Principal component analysis (PCA) and orthogonal projection to latent structures-discriminant analysis (OPLS-DA) were employed as unsupervised and supervised methods in the multivariate analysis, respectively. PCA was used for the identification of outliers (Mahalanobis distance metric) as well as the spontaneous clustering of similar samples in the scatter plot of the two principal components. In the OPLS-DA analysis, the X matrix consisted of metabolite concentrations, while the Y vector contained information regarding the group (IUGR or NORM). The goodness of fit of the OPLS-DA model (R2Y) was reported, and predictive performance was assessed through cross-validation. Metrics such as the predictive ability of the model (Q2Y) and the predictive ability of permuted models (Q2Y-perm) were calculated for evaluation. OPLS-DA loading plots were used to illustrate the metabolites that contributed the most to the separation between the IUGR and NORM groups. The identification of metabolites of interest was made through the combination of the variable importance in the projection (VIP) and the loading between the metabolite in the X matrix and the predictive latent variable (pLV) of the model. Metabolites with VIP >1.0 and absolute high loading values were considered important in the metabolomics signature (De la Barca et al., 2022).References:Chao de la Barca JM, Chabrun F, Lefebvre T, Roche O, Huetz N, Blanchet O, Legendre G, Simard G, Reynier P, Gascoin G: A Metabolomic Profiling of Intra-Uterine Growth Restriction in Placenta and Cord Blood Points to an Impairment of Lipid and Energetic Metabolism. Biomedicines 2022, 10:1411.
Z
Lipidomics LC-MS analysis support tools for outlier detection
data.niaid.nih.gov
zenodo.org
Updated Mar 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spick, Matt (2024). Lipidomics LC-MS analysis support tools for outlier detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10889320
Explore at:
Dataset updated
Mar 28, 2024
Dataset authored and provided by
Spick, Matt
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Identification of features with high levels of confidence in liquid chromatography-mass spectrometry (LC MS) lipidomics research is an essential part of biomarker discovery, but existing software platforms can give inconsistent results, even from identical spectral data. This poses a clear challenge for reproducibility in bioinformatics work, and highlights the importance of data-driven outlier detection in assessing spectral outputs – here demonstrated using a machine learning approach based on support vector machine regression combined with leave-one-out cross validation – as well as manual curation, in order to identify software-driven errors driven by closely related lipids and by co-elution issues.

The lipidomics case study dataset used in this work analysed a lipid extraction of a human pancreatic adenocarcinoma cell line (PANC-1, Merck, UK, cat no. 87092802) analysed using an Acquity M-Class UPLC system (Waters, UK) coupled to a ZenoToF 7600 mass spectrometer (Sciex, UK). Raw output files are included alongside processed data using MS DIAL (v4.9.221218) and Lipostar (v2.1.4) and a Jupyter notebook with Python code to analyse the outputs for outlier detection.
u
High-high cluster and high-low outlier road intersections for road traffic...
zivahub.uct.ac.za
docx
Updated Jun 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simone Vieira; Simon Hull; Roger Behrens (2024). High-high cluster and high-low outlier road intersections for road traffic crashes involving severely injured pedestrians within the CoCT in 2017, 2018 and 2019 [Dataset]. http://doi.org/10.25375/uct.25974964.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.25375/uct.25974964.v1
Dataset updated
Jun 6, 2024
Dataset provided by
University of Cape Town
Authors
Simone Vieira; Simon Hull; Roger Behrens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset offers a detailed inventory of road intersections and their corresponding suburbs within Cape Town, meticulously curated to highlight instances of high pedestrian crash counts resulting in serious injuries observed in "high-high" cluster and "high-low" outlier fishnet grid cells across the years 2017, 2018 and 2019. To enhance its utility, the dataset meticulously colour-codes each month associated with elevated crash occurrences, providing a nuanced perspective. Furthermore, the dataset categorises road intersections based on their placement within "high-high" clusters (marked with pink tabs) or "high-low" outlier cells (indicated by red tabs). For ease of navigation, the intersections are further organised alphabetically by suburb name, ensuring accessibility and clarity.Data SpecificsData Type: Geospatial-temporal categorical data with numeric attributesFile Format: Word document (.docx)Size: 231 KBNumber of Files: The dataset contains a total of 245 road intersection records (7 "high-high" clusters and 238 "high-low" outliers)Date Created: 21st May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, Open Refine, Python, SQLProcessing Steps: The raw road traffic crash data underwent a comprehensive refining process using Python software to ensure its accuracy and consistency. Following this, duplicates were eliminated to retain only one entry per crash incident. Subsequently, the data underwent further refinement with Open Refine software, focusing specifically on isolating unique crash descriptions for subsequent geocoding in ArcGIS Pro. Notably, during this process, only the road intersection crashes were retained, as they were the only incidents with spatial definitions.Once geocoded, road intersection crashes that involved a pedestrian with a severe or fatal injury type were extracted so that subsequent spatio-temporal analyses would focus on these crashes only. The spatio-temporal analysis methods by which these pedestrian crashes were analysed included spatial autocorrelation, hotspot analysis, and cluster and outlier analysis. Leveraging these methods, road intersections with pedestrian crashes that resulted in a severe injury identified as either "high-high" clusters or "high-low" outliers were extracted for inclusion in the dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2019
u
High-high cluster and high-low outlier road intersections for motorcycle...
zivahub.uct.ac.za
docx
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simone Vieira; Simon Hull; Roger Behrens (2024). High-high cluster and high-low outlier road intersections for motorcycle road traffic crashes resulting in injuries within the CoCT in 2017, 2018 and 2019 [Dataset]. http://doi.org/10.25375/uct.25967455.v2
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.25375/uct.25967455.v2
Dataset updated
Jun 6, 2024
Dataset provided by
University of Cape Town
Authors
Simone Vieira; Simon Hull; Roger Behrens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset offers a detailed inventory of road intersections and their corresponding suburbs within Cape Town, meticulously curated to highlight instances of high motorcycle (Motorcycle: Above 125cc, Motorcycle: 125cc and under, Quadru-cycle, Motor Tricycle) crash counts that resulted in injuries (slight, serious, fatalities) observed in "high-high" cluster and "high-low" outlier fishnet grid cells across the years 2017, 2018 and 2019. To enhance its utility, the dataset meticulously colour-codes each month associated with elevated crash occurrences, providing a nuanced perspective. Furthermore, the dataset categorises road intersections based on their placement within "high-high" clusters (marked with pink tabs) or "high-low" outlier cells (indicated by red tabs). For ease of navigation, the intersections are further organised alphabetically by suburb name, ensuring accessibility and clarity.Data SpecificsData Type: Geospatial-temporal categorical data with numeric attributesFile Format: Word document (.docx)Size: 157 KBNumber of Files: The dataset contains a total of 158 road intersection records (11 "high-high" clusters and 147 "high-low" outliers)Date Created: 22nd May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, Open Refine, Python, SQLProcessing Steps: The raw road traffic crash data underwent a comprehensive refining process using Python software to ensure its accuracy and consistency. Following this, duplicates were eliminated to retain only one entry per crash incident. Subsequently, the data underwent further refinement with Open Refine software, focusing specifically on isolating unique crash descriptions for subsequent geocoding in ArcGIS Pro. Notably, during this process, only the road intersection crashes were retained, as they were the only incidents with spatial definitions.Once geocoded, road intersection crashes that involved either a motor tricycle, motorcycle above 125cc, motorcycle below 125cc and quadru-cycles and that were additionally associated with a slight, severe or fatal injury type were extracted so that subsequent spatio-temporal analyses would focus on these crashes only. The spatio-temporal analysis methods by which these motorcycle crashes were analysed included spatial autocorrelation, hotspot analysis, and cluster and outlier analysis. Leveraging these methods, road intersections with motorcycle crashes identified as either "high-high" clusters or "high-low" outliers were extracted for inclusion in the dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2019
n
M1 mooring, northwestern Barents Sea: Physical oceanography data from 2021...
data.npolar.no
bin, nc
Updated Nov 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Foss, Øyvind (oyvind.foss@npolar.no); Sundfjord, Arild (arild.sundfjord@npolar.no); Foss, Øyvind (oyvind.foss@npolar.no); Sundfjord, Arild (arild.sundfjord@npolar.no) (2024). M1 mooring, northwestern Barents Sea: Physical oceanography data from 2021 onward [Dataset]. http://doi.org/10.21334/npolar.2024.bb2d725f
Explore at:
nc, binAvailable download formats
Unique identifier
https://doi.org/10.21334/npolar.2024.bb2d725f
Dataset updated
Nov 7, 2024
Dataset provided by
Norwegian Polar Data Centre
Authors
Foss, Øyvind (oyvind.foss@npolar.no); Sundfjord, Arild (arild.sundfjord@npolar.no); Foss, Øyvind (oyvind.foss@npolar.no); Sundfjord, Arild (arild.sundfjord@npolar.no)
License
http://spdx.org/licenses/CC0-1.0http://spdx.org/licenses/CC0-1.0
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Nov 10, 2021 - Oct 7, 2022
Area covered
Description
This dataset contains hydrography (CTD and T instruments) and current (ADCP instruments) data from the mooring M1 in the northwestern Barents Sea from 2021 onward.

Details of the mooring deployments and the data processing can be found below.

Data will be added to this dataset as they become available and processed. A summary of the dataset history is found at the bottom of this page.

"https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/m1_4/m1_map.png"> https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/m1_4/m1_map.png" width="900" alt="M1 map">
M1 mooring location showing IBCAO v4 bathymetry.

The M1 mooring

The M1 mooring in the northwestern Barents Sea was first deployed in late 2018 as part of the Nansen Legacy project. The mooring was located on the topographic slope at a site meant to capture inflows from the north.

The base instrumentation on the mooring consists of hydrographic instruments (CTD/T) and current profilers (ADCP). Other sensors have also been deployed on the mooring on various deployments -- these data will be published elsewhere.

Data from the three first deployments of M1 (M1-1, M1-2, and M1-3) have been published at are available at the following DOI: https://doi.org/10.21334/npolar.2022.1a68b156. While the Nansen Legacy project has ended, the M1 mooring is currently maintained by the Norwegian Polar Institute.

| Mooring | Deployed | Recovered | Bottom depth | Latitude (N) | Longitude (E)| Status | |--|--|--|--|--|--| -- | | M1-4 | 10.11.21 | 07.10.22 | 263 m | 79.5829 | 28.0717 | Hydrography data published | | M1-5 | 07.10.22 | - | 268 m | 79.5819 | 28.0866 | Mooring lost |

Deployment and data processing details

Click dropdown menus to display content:

Details, M1-4 mooring (2021-2022) hydrography data

M1-4 mooring (2021-2022) hydrography data

Important information

The salinity (PSAL) observed at the two upper CTD sensors (#60600 and #204991) at M1-4 compared somewhat poorly with salinity measured by the ship CTD at at the mooring site before recovery. Ad-hoc corrections, detailed below, have been applied to these two salinity records, but in-situ calibration is complicated by the hydrographic variability in the area. - Users should be aware that salinity values from these two sensors are uncertain. - Original, unedited salinity can be recomputed from CNDC (which has not been adjusted) along with PRES and TEMP.

Instrument overview

M1-4 CTD instruments: | Instrument | S/N | Depth (m) | Sampling rate | File| |------------|--------|-------------------|-------------------------|---| | RBR Concerto x | 60600 |22 | 1 min | M1_2021_2022_RBR_CONC_60600_pres_temp_sal_22m_v1.nc | | RBR Concerto x * | 204991 |26 | 10 min | M1_2021_2022_RBR_CONC_204991_pres_temp_sal_26m_v1.nc | | RBR Concerto | 201405 | 59 |1 min | M1_2021_2022_RBR_CONC_201405_pres_temp_sal_59m_v1.nc | | RBR Concerto | 60591 | 92 | 1 min | M1_2021_2022_RBR_CONC_60591_pres_temp_sal_92m_v1.nc | | RBR Solo | 102486 | 154 | 5 sec | M1_2021_2022_RBR_SOLO_102486_temp_154m_v1.nc | | RBR Concerto | 60592 |174 | 1 min | M1_2021_2022_RBR_CONC_60592_pres_temp_sal_174m_v1.nc | | RBR Solo | 102477 | 216 | 5 sec | M1_2021_2022_RBR_SOLO_102477_temp_216m_v1.nc | | Seabird SBE37SMP | 23180 | 250 | 1 hr | M1_2021_2022_SBE37_23180_pres_temp_sal_250m_v1.nc |

x Drift corrections have been applied to PSAL (details below). CNDC has not been edited.

* With CHLA and PAR (not included here)

Quality control and salinity corrections

Pressure

The pressure records were complete and without obvious spikes. No major drift was noticeable. After recovery, all sensors were found to be within ±10.5 dbar. No corrections were made to pressure in post-processing.

Assigning pressure to RBR Solos (temperature-only instruments)

RBR Solo instrument pressure was estimated by interpolating between by interpolating between the pressure records of CTD instruments located above and below the instrument based on nominal/target depths.

Salinity

Comparison between moored and shipboard CTDs

Temperature and salinity from the moored instruments were compared with shipboard CTD profiles collected at the mooring site shortly after deployment and shortly before recovery. The figures below show temperature and salinity profiles collected at the mooring sites and T-S plots from the R/V Kronprins Haakon's SBE911+ CTD (thick black lines). Other CTD profiles collected within <5 km of the mooring sites are included as thin gray lines to illustrate the background variability. Colored circles show the moored CTD values the time closest to the profile time stamp, with smaller colored dots showing values within ±1 hour of the profile time stamp.

The water properties at the mooring sites are highly variable on small spatial and temporal scales, with rapid property changes and a large degree of interleaving. This makes in-situ calibration challenging. We have only made corrections where differences were rather stark. We found that the temperature and salinity values generally compared well with shipboard observations given the limitations above.

"https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/m1_4/startctd_profile_comparison.png"> https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/m1_4/startctd_profile_comparison.png" width="900" alt="M1 map">
Comparison between shipboard and moored CTDs near the mooring site after recovery of M1-4 (2021-11-10).

"https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/m1_4/end_ctd_profile_comparison.png"> https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/m1_4/end_ctd_profile_comparison.png" width="900" alt="M1 map">
Comparison between shipboard and moored CTDs (before adjustment of the top sensors) near the mooring site before recovery of M1-4 (2022-10-07).

Salinity drift corrections

Salinity values at the two upper sensors agreed well with ship CTDs on deployment but clearly deviated from shipboard profile values at the end of the record. Post-deployment laboratory calibration of the instrument sensors agreed well with pre-deployment ones, indicating that the observed discrepancy was likely due to biofouling of the sensors.

As a corrective measure, we applied a corrective drift factor latex f to the salinity record, where latex f increased linearly from 1 at the deployment start and equal to the ratio between shipboard- amd mooring-observed salinity, latex f_{END} = PSAL_{SHIP} / PSAL_{MOOR} where the pressure and time align closest during the pre-recovery cast.

The adjusted salinty latex PSAL_{ADJ} was computed from initial salinity latex PSAL_{INI} as

PSAL_{ADJ}` = f\cdot PSAL_{INI}

| Instrument serial number | Final correction factor latex f_{END} | |--|--| | 60600 | 1.0037 | | 204991 | 1.0062 |

"https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/m1_4/end_ctd_profile_comparison_after_adjustment.png"> https://gitlab.com/NPIOcean/npi-figure-store/-/raw/main/figures/m1_4/end_ctd_profile_comparison_after_adjustment.png" width="900" alt="M1 map">
Comparison between shipboard and moored CTDs (after adjustment of the top sensors) near the mooring site before recovery of M1-4 (2022-10-07).

Temperature

Temperature records were found to be well-behaved and no obvious sensor drift was observed.

Additional Post-processing

Data outside the range 2021-11-10 21:00 - 2022-10-07 06:30 were removed from the dataset in order not to include samples from deck time or during recovery/deployment.

Salinity editing

The following processing steps were applied in order to gently denoise the Concerto salinity records and remove salinity outliers. This processing may or may be sufficient for specific scientific uses. Interested users can recalculate unaltered salinity from conductivity, tempererature and pressure.

A 15- or 30- minute* running mean (denoted _roll) was applied toCNDC and TEMP.

Salinity was recomputed from CNDC_roll, TEMP_roll, and pressure using the gsw-Python package.

Salinity outliers were identified and rejected by:

(RBR Concertos): Comparing conductivity and temperature after normalizing both to the same scale agains a difference criterion latex \alpha which was given a value between 0.15 and 0.3. Instances of large conductivity spikes with no corresponding spike in temperature were interpreted as erroneous due to e.g. the passage of biological matter through the conductivity cell. Salinity from samples where the criterion was met were rejected. This removed less than 0.1% of the samples.

\bigg| \frac{CNDC\_roll - \text{mean}(CNDC\_roll)}{\text{sd}(CNDC\_roll)} - \frac{TEMP\_roll - \text{mean}(TEMP\_roll)}{\text{sd}(TEMP\_roll)}\bigg|>\alpha

Rejecting outliers identified as PSAL values deviating from the 7-day running median by more than 5 rolling standard deviations.

Rejecting any additional obvious salinity outliers based on visual inspection of density.

Details of the processing steps taken, as well as a Python script to reproduce the processing based on source data, can be found in the PROCESSING variable of each file.

* *A 15-point (15-minute) running mean for all Concertos except #204991 where a 3-point
u
High-high cluster and high-low outlier road intersections for road traffic...
zivahub.uct.ac.za
docx
Updated Jun 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simone Vieira; Simon Hull; Roger Behrens (2024). High-high cluster and high-low outlier road intersections for road traffic crashes within the CoCT in 2017, 2018, 2019 and 2021 [Dataset]. http://doi.org/10.25375/uct.25966402.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.25375/uct.25966402.v1
Dataset updated
Jun 6, 2024
Dataset provided by
University of Cape Town
Authors
Simone Vieira; Simon Hull; Roger Behrens
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset offers a detailed inventory of road intersections and their corresponding suburbs within Cape Town, meticulously curated to highlight instances of high crash counts observed in "high-high" cluster and "high-low" outlier fishnet grid cells across the years 2017, 2018, 2019, and 2021. To enhance its utility, the dataset meticulously colour-codes each month associated with elevated crash occurrences, providing a nuanced perspective. Furthermore, the dataset categorises road intersections based on their placement within "high-high" clusters (marked with pink tabs) or "high-low" outlier cells (indicated by red tabs). For ease of navigation, the intersections are further organised alphabetically by suburb name, ensuring accessibility and clarity.Data SpecificsData Type: Geospatial-temporal categorical data with numeric attributesFile Format: Word document (.docx)Size: 602 KBNumber of Files: The dataset contains a total of 625 road intersection records (606 "high-high" cluster and 19 "high-low" outliers)Date Created: 21st May 2024MethodologyData Collection Method: The descriptive road traffic crash data per crash victim involved in the crashes was obtained from the City of Cape Town Network InformationSoftware: ArcGIS Pro, Open Refine, Python, SQLProcessing Steps: The raw road traffic crash data underwent a comprehensive refining process using Python software. Following this, duplicate crash records were eliminated to retain only one entry per crash. Subsequently, the data underwent further refinement with Open Refine software, focusing specifically on isolating unique crash descriptions for subsequent geocoding in ArcGIS Pro. Notably, during this process, only the road intersection crashes were retained, as they were the only crashes that were able to be spatially defined.Once geocoded, the road traffic crash data underwent rigorous spatio-temporal analyses, encompassing spatial autocorrelation, hotspot analysis, and cluster and outlier analysis. Leveraging these methods, road intersections identified as either "high-high" clusters or "high-low" outliers were extracted for inclusion in the dataset.Geospatial InformationSpatial Coverage:West Bounding Coordinate: 18°20'EEast Bounding Coordinate: 19°05'ENorth Bounding Coordinate: 33°25'SSouth Bounding Coordinate: 34°25'SCoordinate System: South African Reference System (Lo19) using the Universal Transverse Mercator projectionTemporal InformationTemporal Coverage:Start Date: 01/01/2017End Date: 31/12/2021 (2020 data omitted)
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mario Lovric; Mario Lovric (2020). Chemical outlier dataset [Dataset]. http://doi.org/10.5281/zenodo.1167835

Chemical outlier dataset

Explore at:

354 scholarly articles cite this dataset (View in Google Scholar)

binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1167835

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Mario Lovric; Mario Lovric

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The objects are numbered. The Y-variable are boiling points. Other features are structural features of molecules. In the outlier column the outliers are assigned with a value of 1.

The data is derived from a published chemical dataset on boiling point measurements [1] and from public data [2]. Features were generated by means of the RDKit Python library [3]. The dataset was infused with known outliers (~5%) based on significant structural differences, i.e. polar and non-polar molecules.

Cherqaoui D., Villemin D. Use of a Neural Network to determine the Boiling Point of Alkanes. J CHEM SOC FARADAY TRANS. 1994;90(1):97–102.
https://pubchem.ncbi.nlm.nih.gov/
RDKit: Open-source cheminformatics; http://www.rdkit.org

Clear search

Close search

Google apps

Main menu

Chemical outlier dataset

Data_Sheet_1_ExGUtils: A Python Package for Statistical Analysis With the...

Bathymetry of the Main Pool of Lake Calumet, Cook County, Illinois, July...

Additional file 2 - datasets and scripts for metabolome analysis

Lipidomics LC-MS analysis support tools for outlier detection

High-high cluster and high-low outlier road intersections for road traffic...

High-high cluster and high-low outlier road intersections for motorcycle...

M1 mooring, northwestern Barents Sea: Physical oceanography data from 2021...

The M1 mooring

Deployment and data processing details

M1-4 mooring (2021-2022) hydrography data

Important information

Instrument overview

Quality control and salinity corrections

Pressure

Salinity

Temperature

Additional Post-processing

Salinity editing

High-high cluster and high-low outlier road intersections for road traffic...

Chemical outlier dataset