37 datasets found

f
Data from: Methodology to filter out outliers in high spatial density data...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken (2023). Methodology to filter out outliers in high spatial density data to improve maps reliability [Dataset]. http://doi.org/10.6084/m9.figshare.14305658.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14305658.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.
f
table with complete output from outlier detection analysis
datasetcatalog.nlm.nih.gov
figshare.com
Updated Mar 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dennison, Philip; Limousin, Jean-Marc; Delgado-Dávila, Ruth; Danson, Mark; Mouillot, Florent; Tavşanoğlu, Çağatay; Féret, Jean-Baptiste; Roberts, Dar; Gabriel, Eva; Kütküt, Pınar; Castro, Francesc Xavier; Cardenas, Nicolas Younes; Scortechini, Gianluca; Granda, Elena; Chuvieco, Emilio; Bar-Massada, Avi; Nolan, Rachael; Kristina, Agnes; Msweli, Samukelisiwe; Kotzur, Ivan; Taylor, Jackson; Di Bella, Carlos; Aktepe, Nursema; Beget, María Eugenia; He, Binbin; Chen, Rui; Brown, Tegan; Forsyth, Greg; Almoustafa, Turkia; Kraaij, Tineke; Moreira, Bruno; Ventura, Andrea; Griebel, Anne; de Dios, Víctor Resco; Adeline, Karine; Domenech, Oriol; Gharbi, Fatma; Morais, Marco; Martin, Maria; Boer, Matthias; Jolly, Matt; Qi, Yi; Gagkas, Zisis; Değirmenci, Cihan Ünal; Bradstock, Ross; Pellizzaro, Grazia; Monteiro, Antonio T.; Taylor, Andy; Quan, Xingwen; Tüfekcioğlu, İrem; Yebra, Marta (2024). table with complete output from outlier detection analysis [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001390504
Explore at:
Dataset updated
Mar 25, 2024
Authors
Dennison, Philip; Limousin, Jean-Marc; Delgado-Dávila, Ruth; Danson, Mark; Mouillot, Florent; Tavşanoğlu, Çağatay; Féret, Jean-Baptiste; Roberts, Dar; Gabriel, Eva; Kütküt, Pınar; Castro, Francesc Xavier; Cardenas, Nicolas Younes; Scortechini, Gianluca; Granda, Elena; Chuvieco, Emilio; Bar-Massada, Avi; Nolan, Rachael; Kristina, Agnes; Msweli, Samukelisiwe; Kotzur, Ivan; Taylor, Jackson; Di Bella, Carlos; Aktepe, Nursema; Beget, María Eugenia; He, Binbin; Chen, Rui; Brown, Tegan; Forsyth, Greg; Almoustafa, Turkia; Kraaij, Tineke; Moreira, Bruno; Ventura, Andrea; Griebel, Anne; de Dios, Víctor Resco; Adeline, Karine; Domenech, Oriol; Gharbi, Fatma; Morais, Marco; Martin, Maria; Boer, Matthias; Jolly, Matt; Qi, Yi; Gagkas, Zisis; Değirmenci, Cihan Ünal; Bradstock, Ross; Pellizzaro, Grazia; Monteiro, Antonio T.; Taylor, Andy; Quan, Xingwen; Tüfekcioğlu, İrem; Yebra, Marta
Description
table with complete output from outlier detection analysis
f
Comparison experiments by using IF.
figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gen Li; Jason J. Jung (2023). Comparison experiments by using IF. [Dataset]. http://doi.org/10.1371/journal.pone.0247119.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0247119.t001
Dataset updated
Jun 2, 2023
Dataset provided by
PLOS ONE
Authors
Gen Li; Jason J. Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison experiments by using IF.
t
Data from: Matching Map Recovery with an Unknown Number of Outliers
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Matching Map Recovery with an Unknown Number of Outliers [Dataset]. https://service.tib.eu/ldmservice/dataset/matching-map-recovery-with-an-unknown-number-of-outliers
Explore at:
Dataset updated
Dec 16, 2024
Description
The dataset used in the paper is a set of feature-vectors from two sets of d-dimensional noisy feature-vectors.

ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

zenodo.org
explore.openaire.eu
+2more

application/gzip

Updated May 2, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Erich Schubert; Erich Schubert; Arthur Zimek; Arthur Zimek (2024). ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of Object Images (ALOI) [Dataset]. http://doi.org/10.5281/zenodo.6355684

Explore at:

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6355684

Dataset updated

May 2, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Erich Schubert; Erich Schubert; Arthur Zimek; Arthur Zimek

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

2022

Description

These data sets were originally created for the following publications:

M. E. Houle, H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek
Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?
In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.

H.-P. Kriegel, E. Schubert, A. Zimek
Evaluation of Multiple Clustering Solutions
In 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with ECML PKDD 2011, Athens, Greece, 2011.

The outlier data set versions were introduced in:

E. Schubert, R. Wojdanowski, A. Zimek, H.-P. Kriegel
On Evaluation of Outlier Rankings and Outlier Scores
In Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, 2012.

They are derived from the original image data available at https://aloi.science.uva.nl/

The image acquisition process is documented in the original ALOI work: J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders, The Amsterdam library of object images, Int. J. Comput. Vision, 61(1), 103-112, January, 2005

Additional information is available at: https://elki-project.github.io/datasets/multi_view

The following views are currently available:

Feature type	Description	Files
Object number	Sparse 1000 dimensional vectors that give the true object assignment	objs.arff.gz
RGB color histograms	Standard RGB color histograms (uniform binning)	aloi-8d.csv.gz aloi-27d.csv.gz aloi-64d.csv.gz aloi-125d.csv.gz aloi-216d.csv.gz aloi-343d.csv.gz aloi-512d.csv.gz aloi-729d.csv.gz aloi-1000d.csv.gz
HSV color histograms	Standard HSV/HSB color histograms in various binnings	aloi-hsb-2x2x2.csv.gz aloi-hsb-3x3x3.csv.gz aloi-hsb-4x4x4.csv.gz aloi-hsb-5x5x5.csv.gz aloi-hsb-6x6x6.csv.gz aloi-hsb-7x7x7.csv.gz aloi-hsb-7x2x2.csv.gz aloi-hsb-7x3x3.csv.gz aloi-hsb-14x3x3.csv.gz aloi-hsb-8x4x4.csv.gz aloi-hsb-9x5x5.csv.gz aloi-hsb-13x4x4.csv.gz aloi-hsb-14x5x5.csv.gz aloi-hsb-10x6x6.csv.gz aloi-hsb-14x6x6.csv.gz
Color similiarity	Average similarity to 77 reference colors (not histograms) 18 colors x 2 sat x 2 bri + 5 grey values (incl. white, black)	aloi-colorsim77.arff.gz (feature subsets are meaningful here, as these features are computed independently of each other)
Haralick features	First 13 Haralick features (radius 1 pixel)	aloi-haralick-1.csv.gz
Front to back	Vectors representing front face vs. back faces of individual objects	front.arff.gz
Basic light	Vectors indicating basic light situations	light.arff.gz
Manual annotations	Manually annotated object groups of semantically related objects such as cups	manual1.arff.gz

Outlier Detection Versions

Additionally, we generated a number of subsets for outlier detection:

Feature type	Description	Files
RGB Histograms	Downsampled to 100000 objects (553 outliers)	aloi-27d-100000-max10-tot553.csv.gz aloi-64d-100000-max10-tot553.csv.gz
	Downsampled to 75000 objects (717 outliers)	aloi-27d-75000-max4-tot717.csv.gz aloi-64d-75000-max4-tot717.csv.gz
	Downsampled to 50000 objects (1508 outliers)	aloi-27d-50000-max5-tot1508.csv.gz aloi-64d-50000-max5-tot1508.csv.gz

f
Data table presenting the outlier analysis of species comparisons: S. bovis...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Jan 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Landeryou, Toby; Emery, Aidan M.; Tchuem-Tchuenté, Louis-Albert; Rollinson, David; Maddren, Rosie; Webster, Bonnie L.; Allan, Fiona; Anderson, Roy M.; Rabone, Muriel (2022). Data table presenting the outlier analysis of species comparisons: S. bovis x S. haematobium, S. haematobium x S. guineensis, and S. guineensis x S. bovis. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000444128
Explore at:
Dataset updated
Jan 31, 2022
Authors
Landeryou, Toby; Emery, Aidan M.; Tchuem-Tchuenté, Louis-Albert; Rollinson, David; Maddren, Rosie; Webster, Bonnie L.; Allan, Fiona; Anderson, Roy M.; Rabone, Muriel
Description
Table presents Loci number, genomic sequence, translated sequence, biological and molecular function of each loci highlighted in BAYESCAN outlier analysis. (XLSX)
GOPI Resource - Stacked Column Chart - Change in Jobs in Maryland by Month...
data.wu.ac.at
csv, json, xml
Updated Apr 27, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Bureau of Labor Statistics (2017). GOPI Resource - Stacked Column Chart - Change in Jobs in Maryland by Month (with Feb and March 2010 outliers filtered out) [Dataset]. https://data.wu.ac.at/schema/data_maryland_gov/NWk4aS1ieDU2
Explore at:
xml, csv, jsonAvailable download formats
Dataset updated
Apr 27, 2017
Dataset provided by
Bureau of Labor Statisticshttp://www.bls.gov/
Area covered
Maryland
Description
This dataset represents the CHANGE in the number of jobs per industry category and sub-category from the previous month, not the raw counts of actual jobs. The data behind these monthly change values is from the Bureau of Labor Statistics (BLS) Current Employment Statistics (CES) program. CES data represents businesses and government agencies, providing detailed industry data on employment on nonfarm payrolls.
f
Two Variable Artificial Dataset.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hong Choon Ong; Ekele Alih (2023). Two Variable Artificial Dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0125835.t007
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Hong Choon Ong; Ekele Alih
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Two Variable Artificial Dataset.
f
Addition-point OLS matrix, B.
plos.figshare.com
xls
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hong Choon Ong; Ekele Alih (2023). Addition-point OLS matrix, B. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t009
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0125835.t009
Dataset updated
Jun 5, 2023
Dataset provided by
PLOS ONE
Authors
Hong Choon Ong; Ekele Alih
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Addition-point OLS matrix, B.
Supplementary Table 1 from Outlier Kinase Expression by RNA Sequencing as...
aacr.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vishal Kothari; Iris Wei; Sunita Shankar; Shanker Kalyana-Sundaram; Lidong Wang; Linda W. Ma; Pankaj Vats; Catherine S. Grasso; Dan R. Robinson; Yi-Mi Wu; Xuhong Cao; Diane M. Simeone; Arul M. Chinnaiyan; Chandan Kumar-Sinha (2023). Supplementary Table 1 from Outlier Kinase Expression by RNA Sequencing as Targets for Precision Therapy [Dataset]. http://doi.org/10.1158/2159-8290.22529142.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1158/2159-8290.22529142.v1
Dataset updated
Jun 16, 2023
Dataset provided by
American Association for Cancer Researchhttp://www.aacr.org/
Authors
Vishal Kothari; Iris Wei; Sunita Shankar; Shanker Kalyana-Sundaram; Lidong Wang; Linda W. Ma; Pankaj Vats; Catherine S. Grasso; Dan R. Robinson; Yi-Mi Wu; Xuhong Cao; Diane M. Simeone; Arul M. Chinnaiyan; Chandan Kumar-Sinha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
XLSX file - 2238K, Complete kinome compendium of all tissue types by RNA-Seq.
Supplementary Table 2 from Outlier Kinase Expression by RNA Sequencing as...
aacr.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Jun 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vishal Kothari; Iris Wei; Sunita Shankar; Shanker Kalyana-Sundaram; Lidong Wang; Linda W. Ma; Pankaj Vats; Catherine S. Grasso; Dan R. Robinson; Yi-Mi Wu; Xuhong Cao; Diane M. Simeone; Arul M. Chinnaiyan; Chandan Kumar-Sinha (2023). Supplementary Table 2 from Outlier Kinase Expression by RNA Sequencing as Targets for Precision Therapy [Dataset]. http://doi.org/10.1158/2159-8290.22529139.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1158/2159-8290.22529139.v1
Dataset updated
Jun 20, 2023
Dataset provided by
American Association for Cancer Researchhttp://www.aacr.org/
Authors
Vishal Kothari; Iris Wei; Sunita Shankar; Shanker Kalyana-Sundaram; Lidong Wang; Linda W. Ma; Pankaj Vats; Catherine S. Grasso; Dan R. Robinson; Yi-Mi Wu; Xuhong Cao; Diane M. Simeone; Arul M. Chinnaiyan; Chandan Kumar-Sinha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
XLSX file - 776K, Breast sample kinome compendium by RNA-Seq.
n
Anolis carolinensis character displacement SNP
data.niaid.nih.gov
datadryad.org
zip
Updated Jan 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Douglas Crawford (2023). Anolis carolinensis character displacement SNP [Dataset]. http://doi.org/10.5061/dryad.qbzkh18ks
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.qbzkh18ks
Dataset updated
Jan 27, 2023
Dataset provided by
University of Miami
Authors
Douglas Crawford
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Here are six files that provide details for all 44,120 identified single nucleotide polymorphisms (SNPs) or the 215 outlier SNPs associated with the evolution of rapid character displacement among replicate islands with (2Spp) and without competition (1Spp) between two Anolis species. On 2Spp islands, A. carolinensis occurs higher in trees and have evolved larger toe pads. Among 1Spp and 2Spp island populations, we identify 44,120 SNPs, with 215-outlier SNPs with improbably large FST values, low nucleotide variation, greater linkage than expected, and these SNPs are enriched for animal walking behavior. Thus, we conclude that these 215-outliers are evolving by natural selection in response to the phenotypic convergent evolution of character displacement. There are two, non-mutually exclusive perspective of these nucleotide variants. One is character displacement is convergent: all 215 outlier SNPs are shared among 3 out of 5 2Spp island and 24% of outlier SNPS are shared among all five out of five 2Spp island. Second, character displacement is genetically redundant because the allele frequencies in one or more 2Spp are similar to 1Spp islands: among one or more 2Spp islands 33% of outlier SNPS are within the range of 1Spp MiAF and 76% of outliers are more similar to 1Spp island than mean MiAF of 2Spp islands. Focusing on convergence SNP is scientifically more robust, yet it distracts from the perspective of multiple genetic solutions that enhances the rate and stability of adaptive change. The six files include: a description of eight islands, details of 94 individuals, and four files on SNPs. The four SNP files include the VCF files for 94 individuals with 44KSNPs and two files (Excel sheet/tab-delimited file) with FST, p-values and outlier status for all 44,120 identified single nucleotide polymorphisms (SNPs) associated with the evolution of rapid character displacement. The sixth file is a detailed file on the 215 outlier SNPs. Complete sequence data is available at Bioproject PRJNA833453, which including samples not included in this study. The 94 individuals used in this study are described in “Supplemental_Sample_description.txt” Methods Anoles and genomic DNA: Tissue or DNA for 160 Anolis carolinensis and 20 A. sagrei samples were provided by the Museum of Comparative Zoology at Harvard University (Table S2). Samples were previously used to examine evolution of character displacement in native A. carolinensis following invasion by A. sagrei onto man-made spoil islands in Mosquito Lagoon Florida (Stuart et al. 2014). One hundred samples were genomic DNAs, and 80 samples were tissues (terminal tail clip, Table S2). Genomic DNA was isolated from 80 of 160 A. carolinensis individuals (MCZ, Table S2) using a custom SPRI magnetic bead protocol (Psifidi et al. 2015). Briefly, after removing ethanol, tissues were placed in 200 ul of GH buffer (25 mM Tris- HCl pH 7.5, 25 mM EDTA, , 2M GuHCl Guanidine hydrochloride, G3272 SIGMA, 5 mM CaCl2, 0.5% v/v Triton X-100, 1% N-Lauroyl-Sarcosine) with 5% per volume of 20 mg/ml proteinase K (10 ul/200 ul GH) and digested at 55º C for at least 2 hours. After proteinase K digestion, 100 ul of 0.1% carboxyl-modified Sera-Mag Magnetic beads (Fisher Scientific) resuspended in 2.5 M NaCl, 20% PEG were added and allowed to bind the DNA. Beads were subsequently magnetized and washed twice with 200 ul 70% EtOH, and then DNA was eluted in 100 ul 0.1x TE (10 mM Tris, 0.1 mM EDTA). All DNA samples were gel electrophoresed to ensure high molecular mass and quantified by spectrophotometry and fluorescence using Biotium AccuBlueTM High Sensitivity dsDNA Quantitative Solution according to manufacturer’s instructions. Genotyping-by-sequencing (GBS) libraries were prepared using a modified protocol after Elshire et al. (Elshire et al. 2011). Briefly, high-molecular-weight genomic DNA was aliquoted and digested using ApeKI restriction enzyme. Digests from each individual sample were uniquely barcoded, pooled, and size selected to yield insert sizes between 300-700 bp (Borgstrom et al. 2011). Pooled libraries were PCR amplified (15 cycles) using custom primers that extend into the genomic DNA insert by 3 bases (CTG). Adding 3 extra base pairs systematically reduces the number of sequenced GBS tags, ensuring sufficient sequencing depth. The final library had a mean size of 424 bp ranging from 188 to 700 bp . Anolis SNPs: Pooled libraries were sequenced on one lane on the Illumina HiSeq 4000 in 2x150 bp paired-end configuration, yielding approximately 459 million paired-end reads ( ~138 Gb). The medium Q-Score was 42 with the lower 10% Q-Scores exceeding 32 for all 150 bp. The initial library contained 180 individuals with 8,561,493 polymorphic sites. Twenty individuals were Anolis sagrei, and two individuals (Yan 1610 & Yin 1411) clustered with A. sagrei and were not used to define A. carolinesis’ SNPs. Anolis carolinesis reads were aligned to the Anolis carolinensis genome (NCBI RefSeq accession number:/GCF_000090745.1_AnoCar2.0). Single nucleotide polymorphisms (SNPs) for A. carolinensis were called using the GBeaSy analysis pipeline (Wickland et al. 2017) with the following filter settings: minimum read length of 100 bp after barcode and adapter trimming, minimum phred-scaled variant quality of 30 and minimum read depth of 5. SNPs were further filtered by requiring SNPs to occur in > 50% of individuals, and 66 individuals were removed because they had less than 70% of called SNPs. These filtering steps resulted in 51,155 SNPs among 94 individuals. Final filtering among 94 individuals required all sites to be polymorphic (with fewer individuals, some sites were no longer polymorphic) with a maximum of 2 alleles (all are bi-allelic), minimal allele frequency 0.05, and He that does not exceed HWE (FDR <0.01). SNPs with large He were removed (2,280 SNPs). These SNPs with large significant heterozygosity may result from aligning paralogues (different loci), and thus may not represent polymorphisms. No SNPs were removed with low He (due to possible demography or other exceptions to HWE). After filtering, 94 individual yielded 44,120 SNPs. Thus, the final filtered SNP data set was 44K SNPs from 94 indiviuals. Statistical Analyses: Eight A. carolinensis populations were analyzed: three populations from islands with native species only (1Spp islands) and 5 populations from islands where A. carolinesis co-exist with A. sagrei (2Spp islands, Table 1, Table S1). Most analyses pooled the three 1Spp islands and contrasted these with the pooled five 2Spp islands. Two approaches were used to define SNPs with unusually large allele frequency differences between 1Spp and 2Spp islands: 1) comparison of FST values to random permutations and 2) a modified FDIST approach to identify outlier SNPs with large and statistically unlikely FST values. Random Permutations: FST values were calculated in VCFTools (version 4.2, (Danecek et al. 2011)) where the p-value per SNP were defined by comparing FST values to 1,000 random permutations using a custom script (below). Basically, individuals and all their SNPs were randomly assigned to one of eight islands or to 1Spp versus 2Spp groups. The sample sizes (55 for 2Spp and 39 for 1Spp islands) were maintained. FST values were re-calculated for each 1,000 randomizations using VCFTools. Modified FDIST: To identify outlier SNPs with statistically large FST values, a modified FDIST (Beaumont and Nichols 1996) was implemented in Arlequin (Excoffier et al. 2005). This modified approach applies 50,000 coalescent simulations using hierarchical population structure, in which demes are arranged into k groups of d demes and in which migration rates between demes are different within and between groups. Unlike the finite island models, which have led to large frequencies of false positive because populations share different histories (Lotterhos and Whitlock 2014), the hierarchical island model avoids these false positives by avoiding the assumption of similar ancestry (Excoffier et al. 2009). References Beaumont, M. A. and R. A. Nichols. 1996. Evaluating loci for use in the genetic analysis of population structure. P Roy Soc B-Biol Sci 263:1619-1626. Borgstrom, E., S. Lundin, and J. Lundeberg. 2011. Large scale library generation for high throughput sequencing. PLoS One 6:e19119. Bradbury, P. J., Z. Zhang, D. E. Kroon, T. M. Casstevens, Y. Ramdoss, and E. S. Buckler. 2007. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633-2635. Cingolani, P., A. Platts, L. Wang le, M. Coon, T. Nguyen, L. Wang, S. J. Land, X. Lu, and D. M. Ruden. 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6:80-92. Danecek, P., A. Auton, G. Abecasis, C. A. Albers, E. Banks, M. A. DePristo, R. E. Handsaker, G. Lunter, G. T. Marth, S. T. Sherry, G. McVean, R. Durbin, and G. Genomes Project Analysis. 2011. The variant call format and VCFtools. Bioinformatics 27:2156-2158. Earl, D. A. and B. M. vonHoldt. 2011. Structure Harvester: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genet Resour 4:359-361. Elshire, R. J., J. C. Glaubitz, Q. Sun, J. A. Poland, K. Kawamoto, E. S. Buckler, and S. E. Mitchell. 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6:e19379. Evanno, G., S. Regnaut, and J. Goudet. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14:2611-2620. Excoffier, L., T. Hofer, and M. Foll. 2009. Detecting loci under selection in a hierarchically structured population. Heredity 103:285-298. Excoffier, L., G. Laval, and S. Schneider. 2005. Arlequin (version 3.0): An integrated software package for population genetics data analysis.
f
Supplementary Table 5 from Outlier Kinase Expression by RNA Sequencing as...
datasetcatalog.nlm.nih.gov
aacr.figshare.com
Updated Apr 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wu, Yi-Mi; Wei, Iris; W. Ma, Linda; Vats, Pankaj; Shankar, Sunita; Kumar-Sinha, Chandan; Robinson, Dan R.; Kalyana-Sundaram, Shanker; Kothari, Vishal; Cao, Xuhong; Grasso, Catherine S.; Simeone, Diane M.; Chinnaiyan, Arul M.; Wang, Lidong (2023). Supplementary Table 5 from Outlier Kinase Expression by RNA Sequencing as Targets for Precision Therapy [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000988588
Explore at:
Dataset updated
Apr 3, 2023
Authors
Wu, Yi-Mi; Wei, Iris; W. Ma, Linda; Vats, Pankaj; Shankar, Sunita; Kumar-Sinha, Chandan; Robinson, Dan R.; Kalyana-Sundaram, Shanker; Kothari, Vishal; Cao, Xuhong; Grasso, Catherine S.; Simeone, Diane M.; Chinnaiyan, Arul M.; Wang, Lidong
Description
XLSX file - 13K, shRNA, siRNA, and qPCR primer sequences used.
API security: Access behavior anomaly dataset
kaggle.com
Updated Nov 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ravi Guntur (2021). API security: Access behavior anomaly dataset [Dataset]. https://www.kaggle.com/datasets/tangodelta/api-access-behaviour-anomaly-dataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 22, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ravi Guntur
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Context

Distributed micro-services based applications are typically accessed via APIs. These APIs are used either by apps or they can be accessed directly by programmatic means. Many a time API access is abused by attackers trying to exploit the business logic exposed by these APIs. The way normal users access these APIs is different from how the attackers access these APIs. Many applications have 100s of APIs that are called in specific order and depending on various factors such as browser refreshes, session refreshes, network errors, or programmatic access these behaviors are not static and can vary for the same user. API calls in long running sessions form access graphs that need to be analysed in order to discover attack patterns and anomalies. Graphs dont lend themselves to numerical computation. We address this issue and provide a dataset where user access behavior is qualified as numerical features. In addition we provide a dataset where raw API call graphs are provided. Supporting the use of these datasets two notebooks on classification, node embeddings and clustering are also provided.

About the dataset

There are 4 files provided. Two files are in CSV format and two files are in JSON format. The files in CSV format are user behavior graphs represented as behavior metrics. The JSON files are the actual API call graphs. The two datasets can be joined on a key so that those who want to combine graphs with metrics could do so in novel ways.

What is new in this dataset

This data set captures API access patterns in terms of behavior metrics. Behaviors are captured by tracking users' API call graphs which are then summarized in terms of metrics. In some sense a categorical sequence of entities has been reduced to numerical metrics.

CSV dataset

There are two files provided. One called supervised_dataset.csv has behaviors labeled as normal or outlier. The second file called remaining_behavior_ext.csv has a larger number of samples that are not labeled but has additional insights as well as a classification created by another algorithm.

What is each row

Each row is one instance of an observed behavior that has been manually classified as normal or outlier

JSON dataset

There are two files provided to correspond to the two CSV files

What is each item

Each item has an _id field that can be used to join against the CSV data sets. Then we have the API behavior graph represented as a list of edges.

Inspiration

To model the classification label with a skewed distribution of normal and abnormal cases and with very few labeled samples available. Use supervised_dataset.csv

To verify where the predicted class differs from the class determined by a second algorithm. Use remaining_behavior_ext.csv
Supplementary Table 4 from Outlier Kinase Expression by RNA Sequencing as...
aacr.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Jun 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vishal Kothari; Iris Wei; Sunita Shankar; Shanker Kalyana-Sundaram; Lidong Wang; Linda W. Ma; Pankaj Vats; Catherine S. Grasso; Dan R. Robinson; Yi-Mi Wu; Xuhong Cao; Diane M. Simeone; Arul M. Chinnaiyan; Chandan Kumar-Sinha (2023). Supplementary Table 4 from Outlier Kinase Expression by RNA Sequencing as Targets for Precision Therapy [Dataset]. http://doi.org/10.1158/2159-8290.22529133.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1158/2159-8290.22529133.v1
Dataset updated
Jun 18, 2023
Dataset provided by
American Association for Cancer Researchhttp://www.aacr.org/
Authors
Vishal Kothari; Iris Wei; Sunita Shankar; Shanker Kalyana-Sundaram; Lidong Wang; Linda W. Ma; Pankaj Vats; Catherine S. Grasso; Dan R. Robinson; Yi-Mi Wu; Xuhong Cao; Diane M. Simeone; Arul M. Chinnaiyan; Chandan Kumar-Sinha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
XLSX file - 613K, RNA-Seq data for matched primary pancreatic adenocarcinoma xenograft tissue, DS-08-947, and derived xenograft cell line.
Supplementary Table 3 from Outlier Kinase Expression by RNA Sequencing as...
aacr.figshare.com
datasetcatalog.nlm.nih.gov
xlsx
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vishal Kothari; Iris Wei; Sunita Shankar; Shanker Kalyana-Sundaram; Lidong Wang; Linda W. Ma; Pankaj Vats; Catherine S. Grasso; Dan R. Robinson; Yi-Mi Wu; Xuhong Cao; Diane M. Simeone; Arul M. Chinnaiyan; Chandan Kumar-Sinha (2023). Supplementary Table 3 from Outlier Kinase Expression by RNA Sequencing as Targets for Precision Therapy [Dataset]. http://doi.org/10.1158/2159-8290.22529136.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1158/2159-8290.22529136.v1
Dataset updated
Jun 21, 2023
Dataset provided by
American Association for Cancer Researchhttp://www.aacr.org/
Authors
Vishal Kothari; Iris Wei; Sunita Shankar; Shanker Kalyana-Sundaram; Lidong Wang; Linda W. Ma; Pankaj Vats; Catherine S. Grasso; Dan R. Robinson; Yi-Mi Wu; Xuhong Cao; Diane M. Simeone; Arul M. Chinnaiyan; Chandan Kumar-Sinha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
XLSX file - 487K, Pancreatic sample kinome compendium by RNA-Seq.
f
The crcc T2 Revised statistics.
plos.figshare.com
xls
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hong Choon Ong; Ekele Alih (2023). The crcc T2 Revised statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t015
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0125835.t015
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Hong Choon Ong; Ekele Alih
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The crcc T2 Revised statistics.
Data from: Localizing FST outliers on a QTL map reveals evidence for large...
zenodo.org
search.dataone.org
+2more
txt
Updated May 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sara Via; Gina Conte; Casey Mason-Foley; Kelly Mills; Sara Via; Gina Conte; Casey Mason-Foley; Kelly Mills (2022). Data from: Localizing FST outliers on a QTL map reveals evidence for large genomic regions of reduced gene exchange during speciation-with-gene-flow [Dataset]. http://doi.org/10.5061/dryad.9cf75
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.9cf75
Dataset updated
May 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sara Via; Gina Conte; Casey Mason-Foley; Kelly Mills; Sara Via; Gina Conte; Casey Mason-Foley; Kelly Mills
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Populations that maintain phenotypic divergence in sympatry typically show a mosaic pattern of genomic divergence, requiring a corresponding mosaic of genomic isolation (reduced gene flow). However, mechanisms that could produce the genomic isolation required for divergence-with-gene-flow have barely been explored, apart from the traditional localized effects of selection and reduced recombination near centromeres or inversions. By localizing FST outliers from a genome scan of wild pea aphid host races on a Quantitative Trait Locus (QTL) map of key traits, we test the hypothesis that between-population recombination and gene exchange are reduced over large 'divergence hitchhiking' (DH) regions. As expected under divergence hitchhiking, our map confirms that QTL and divergent markers cluster together in multiple large genomic regions. Under divergence hitchhiking, the nonoutlier markers within these regions should show signs of reduced gene exchange relative to nonoutlier markers in genomic regions where ongoing gene flow is expected. We use this predicted difference among nonoutliers to perform a critical test of divergence hitchhiking. Results show that nonoutlier markers within clusters of FST outliers and QTL resolve the genetic population structure of the two host races nearly as well as the outliers themselves, while nonoutliers outside DH regions reveal no population structure, as expected if they experience more gene flow. These results provide clear evidence for divergence hitchhiking, a mechanism that may dramatically facilitate the process of speciation-with-gene-flow. They also show the power of integrating genome scans with genetic analyses of the phenotypic traits involved in local adaptation and population divergence.
Y
Citation Network Graph
shibatadb.com
Updated Sep 23, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yubetsu (2011). Citation Network Graph [Dataset]. https://www.shibatadb.com/article/SKbAgbBG
Explore at:
Dataset updated
Sep 23, 2011
Dataset authored and provided by
Yubetsu
License
https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt
Description
Network of 42 papers and 71 citation links related to "Sparse online low-rank projection and outlier rejection (SOLO) for 3-D rigid-body motion registration".
f
The Pulp-fibre Dataset.
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hong Choon Ong; Ekele Alih (2023). The Pulp-fibre Dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0125835.t012
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0125835.t012
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Hong Choon Ong; Ekele Alih
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Pulp-fibre Dataset.

Facebook

Twitter

Click to copy link

Link copied

Cite

Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken (2023). Methodology to filter out outliers in high spatial density data to improve maps reliability [Dataset]. http://doi.org/10.6084/m9.figshare.14305658.v1

Data from: Methodology to filter out outliers in high spatial density data to improve maps reliability

Explore at:

jpegAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.14305658.v1

Dataset updated

Jun 4, 2023

Dataset provided by

SciELO journals

Authors

Leonardo Felipe Maldaner; José Paulo Molin; Mark Spekken

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.

Clear search

Close search

Google apps

Main menu

Data from: Methodology to filter out outliers in high spatial density data...

table with complete output from outlier detection analysis

Comparison experiments by using IF.

Data from: Matching Map Recovery with an Unknown Number of Outliers

ELKI Multi-View Clustering Data Sets Based on the Amsterdam Library of...

Data table presenting the outlier analysis of species comparisons: S. bovis...

GOPI Resource - Stacked Column Chart - Change in Jobs in Maryland by Month...

Two Variable Artificial Dataset.

Addition-point OLS matrix, B.

Supplementary Table 1 from Outlier Kinase Expression by RNA Sequencing as...

Supplementary Table 2 from Outlier Kinase Expression by RNA Sequencing as...

Anolis carolinensis character displacement SNP

Supplementary Table 5 from Outlier Kinase Expression by RNA Sequencing as...

API security: Access behavior anomaly dataset

Context

About the dataset

What is new in this dataset

CSV dataset

What is each row

JSON dataset

What is each item

Inspiration

Supplementary Table 4 from Outlier Kinase Expression by RNA Sequencing as...

Supplementary Table 3 from Outlier Kinase Expression by RNA Sequencing as...

The crcc T2 Revised statistics.

Data from: Localizing FST outliers on a QTL map reveals evidence for large...

Citation Network Graph

The Pulp-fibre Dataset.

Data from: Methodology to filter out outliers in high spatial density data to improve maps reliability