7 datasets found
  1. f

    Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq...

    • frontiersin.figshare.com
    • figshare.com
    xlsx
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach (2023). Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.xlsx [Dataset]. http://doi.org/10.3389/fgene.2020.00594.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.

  2. Landsat 8-9 Normalized Difference Vegetation Index (NDVI) Colorized

    • hub.arcgis.com
    Updated Aug 11, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2016). Landsat 8-9 Normalized Difference Vegetation Index (NDVI) Colorized [Dataset]. https://hub.arcgis.com/datasets/f6bb66f1c11e467f9a9a859343e27cf8
    Explore at:
    Dataset updated
    Aug 11, 2016
    Dataset authored and provided by
    Esrihttp://esri.com/
    Area covered
    Description

    This layer includes Landsat 8 and 9 imagery rendered on-the-fly as NDVI Colorized for use in visualization and analysis. This layer is time enabled and includes a number of band combinations and indices rendered on demand. The imagery includes eight multispectral bands from the Operational Land Imager (OLI) and two bands from the Thermal Infrared Sensor (TIRS). It is updated daily with new imagery directly sourced from the USGS Landsat collection on AWS.Geographic CoverageGlobal Land Surface.Polar regions are available in polar-projected Imagery Layers: Landsat Arctic Views and Landsat Antarctic Views.Temporal CoverageThis layer is updated daily with new imagery.Working in tandem, Landsat 8 and 9 revisit each point on Earth's land surface every 8 days.Most images collected from January 2015 to present are included.Approximately 5 images for each path/row from 2013 and 2014 are also included.Product LevelThe Landsat 8 and 9 imagery in this layer is comprised of Collection 2 Level-1 data.The imagery has Top of Atmosphere (TOA) correction applied.TOA is applied using the radiometric rescaling coefficients provided the USGS.The TOA reflectance values (ranging 0 – 1 by default) are scaled using a range of 0 – 10,000.Image Selection/FilteringA number of fields are available for filtering, including Acquisition Date, Estimated Cloud Cover, and Product ID.To isolate and work with specific images, either use the ‘Image Filter’ to create custom layers or add a ‘Query Filter’ to restrict the default layer display to a specified image or group of images.Visual RenderingDefault rendering is NDVI Colorized, calculated as (b5 - b4) / (b5 + b4) with a colormap applied.Raster Functions enable on-the-fly rendering of band combinations and calculated indices from the source imagery.The DRA version of each layer enables visualization of the full dynamic range of the images.Other pre-defined Raster Functions can be selected via the renderer drop-down or custom functions can be created.This layer is part of a larger collection of Landsat Imagery Layers that you can use to perform a variety of mapping analysis tasks.Pre-defined functions: Natural Color with DRA, Agriculture with DRA, Geology with DRA, Color Infrared with DRA, Bathymetric with DRA, Short-wave Infrared with DRA, Normalized Difference Moisture Index Colorized, NDVI Raw, NDVI Colorized, NBR Raw15 meter Landsat Imagery Layers are also available: Panchromatic and Pansharpened.Multispectral BandsThe table below lists all available multispectral OLI bands. NDVI Colorized consumes bands 4 and 5.BandDescriptionWavelength (µm)Spatial Resolution (m)1Coastal aerosol0.43 - 0.45302Blue0.45 - 0.51303Green0.53 - 0.59304Red0.64 - 0.67305Near Infrared (NIR)0.85 - 0.88306SWIR 11.57 - 1.65307SWIR 22.11 - 2.29308Cirrus (in OLI this is band 9)1.36 - 1.38309QA Band (available with Collection 1)*NA30*More about the Quality Assessment BandTIRS BandsBandDescriptionWavelength (µm)Spatial Resolution (m)10TIRS110.60 - 11.19100 * (30)11TIRS211.50 - 12.51100 * (30)*TIRS bands are acquired at 100 meter resolution, but are resampled to 30 meter in delivered data product.Additional Usage NotesImage exports are limited to 4,000 columns x 4,000 rows per request.This dynamic imagery layer can be used in Web Maps and ArcGIS Pro as well as web and mobile applications using the ArcGIS REST APIs.WCS and WMS compatibility means this imagery layer can be consumed as WCS or WMS services.The Landsat Explorer App is another way to access and explore the imagery.This layer is part of a larger collection of Landsat Imagery Layers.Data SourceLandsat imagery is sourced from the U.S. Geological Survey (USGS) and the National Aeronautics and Space Administration (NASA). Data is hosted by the Amazon Web Services as part of their Public Data Sets program.For information, see Landsat 8 and Landsat 9.

  3. Z

    Data from: Citation data of arXiv eprints and the associated...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Keisuke Okamura (2024). Citation data of arXiv eprints and the associated quantitatively-and-temporally normalised impact metrics [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5803961
    Explore at:
    Dataset updated
    Jan 7, 2024
    Dataset provided by
    Hitoshi Koshiba
    Keisuke Okamura
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data collection

    This dataset contains information on the eprints posted on arXiv from its launch in 1991 until the end of 2019 (1,589,006 unique eprints), plus the data on their citations and the associated impact metrics. Here, eprints include preprints, conference proceedings, book chapters, data sets and commentary, i.e. every electronic material that has been posted on arXiv.

    The content and metadata of the arXiv eprints were retrieved from the arXiv API (https://arxiv.org/help/api/) as of 21st January 2020, where the metadata included data of the eprint’s title, author, abstract, subject category and the arXiv ID (the arXiv’s original eprint identifier). In addition, the associated citation data were derived from the Semantic Scholar API (https://api.semanticscholar.org/) from 24th January 2020 to 7th February 2020, containing the citation information in and out of the arXiv eprints and their published versions (if applicable). Here, whether an eprint has been published in a journal or other means is assumed to be inferrable, albeit indirectly, from the status of the digital object identifier (DOI) assignment. It is also assumed that if an arXiv eprint received cpre and cpub citations until the data retrieval date (7th February 2020) before and after it is assigned a DOI, respectively, then the citation count of this eprint is recorded in the Semantic Scholar dataset as cpre + cpub. Both the arXiv API and the Semantic Scholar datasets contained the arXiv ID as metadata, which served as a key variable to merge the two datasets.

    The classification of research disciplines is based on that described in the arXiv.org website (https://arxiv.org/help/stats/2020_by_area/). There, the arXiv subject categories are aggregated into several disciplines, of which we restrict our attention to the following six disciplines: Astrophysics (‘astro-ph’), Computer Science (‘comp-sci’), Condensed Matter Physics (‘cond-mat’), High Energy Physics (‘hep’), Mathematics (‘math’) and Other Physics (‘oth-phys’), which collectively accounted for 98% of all the eprints. Those eprints tagged to multiple arXiv disciplines were counted independently for each discipline. Due to this overlapping feature, the current dataset contains a cumulative total of 2,011,216 eprints.

    Some general statistics and visualisations per research discipline are provided in the original article (Okamura, to appear), where the validity and limitations associated with the dataset are also discussed.

    Description of columns (variables)

    arxiv_id : arXiv ID

    category : Research discipline

    pre_year : Year of posting v1 on arXiv

    pub_year : Year of DOI acquisition

    c_tot : No. of citations acquired during 1991–2019

    c_pre : No. of citations acquired before and including the year of DOI acquisition

    c_pub : No. of citations acquired after the year of DOI acquisition

    c_yyyy (yyyy = 1991, …, 2019) : No. of citations acquired in the year yyyy (with ‘yyyy’ running from 1991 to 2019)

    gamma : The quantitatively-and-temporally normalised citation index

    gamma_star : The quantitatively-and-temporally standardised citation index

    Note: The definition of the quantitatively-and-temporally normalised citation index (γ; ‘gamma’) and that of the standardised citation index (γ*; ‘gamma_star’) are provided in the original article (Okamura, to appear). Both indices can be used to compare the citational impact of papers/eprints published in different research disciplines at different times.

    Data files

    A comma-separated values file (‘arXiv_impact.csv’) and a Stata file (‘arXiv_impact.dta’) are provided, both containing the same information.

  4. Z

    Data from: ImageNet-Patch: A Dataset for Benchmarking Machine Learning...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ambra Demontis (2022). ImageNet-Patch: A Dataset for Benchmarking Machine Learning Robustness against Adversarial Patches [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6568777
    Explore at:
    Dataset updated
    Jun 30, 2022
    Dataset provided by
    Fabio Roli
    Battista Biggio
    Angelo Sotgiu
    Daniele Angioni
    Luca Demetrio
    Maura Pintor
    Ambra Demontis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Adversarial patches are optimized contiguous pixel blocks in an input image that cause a machine-learning model to misclassify it. However, their optimization is computationally demanding and requires careful hyperparameter tuning. To overcome these issues, we propose ImageNet-Patch, a dataset to benchmark machine-learning models against adversarial patches. It consists of a set of patches optimized to generalize across different models and applied to ImageNet data after preprocessing them with affine transformations. This process enables an approximate yet faster robustness evaluation, leveraging the transferability of adversarial perturbations.

    We release our dataset as a set of folders indicating the patch target label (e.g., banana), each containing 1000 subfolders as the ImageNet output classes.

    An example showing how to use the dataset is shown below.

    code for testing robustness of a model

    import os.path

    from torchvision import datasets, transforms, models import torch.utils.data

    class ImageFolderWithEmptyDirs(datasets.ImageFolder): """ This is required for handling empty folders from the ImageFolder Class. """

    def find_classes(self, directory):
      classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
      if not classes:
        raise FileNotFoundError(f"Couldn't find any class folder in {directory}.")
      class_to_idx = {cls_name: i for i, cls_name in enumerate(classes) if
              len(os.listdir(os.path.join(directory, cls_name))) > 0}
      return classes, class_to_idx
    

    extract and unzip the dataset, then write top folder here

    dataset_folder = 'data/ImageNet-Patch'

    available_labels = { 487: 'cellular telephone', 513: 'cornet', 546: 'electric guitar', 585: 'hair spray', 804: 'soap dispenser', 806: 'sock', 878: 'typewriter keyboard', 923: 'plate', 954: 'banana', 968: 'cup' }

    select folder with specific target

    target_label = 954

    dataset_folder = os.path.join(dataset_folder, str(target_label)) normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) transforms = transforms.Compose([ transforms.ToTensor(), normalizer ])

    dataset = ImageFolderWithEmptyDirs(dataset_folder, transform=transforms) model = models.resnet50(pretrained=True) loader = torch.utils.data.DataLoader(dataset, shuffle=True, batch_size=5) model.eval()

    batches = 10 correct, attack_success, total = 0, 0, 0 for batch_idx, (images, labels) in enumerate(loader): if batch_idx == batches: break pred = model(images).argmax(dim=1) correct += (pred == labels).sum() attack_success += sum(pred == target_label) total += pred.shape[0]

    accuracy = correct / total attack_sr = attack_success / total

    print("Robust Accuracy: ", accuracy) print("Attack Success: ", attack_sr)

  5. De-identified data for use in analyses.

    • plos.figshare.com
    • figshare.com
    csv
    Updated Dec 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carly A. Busch; Margaret Barstow; Sara E. Brownell; Katelyn M. Cooper (2024). De-identified data for use in analyses. [Dataset]. http://doi.org/10.1371/journal.pmen.0000086.s008
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Carly A. Busch; Margaret Barstow; Sara E. Brownell; Katelyn M. Cooper
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Depression and anxiety are among the most common mental health concerns for science and engineering (S&E) undergraduates in the United States (U.S.), and students perceive they would benefit from knowing a S&E instructor with depression or anxiety. However, it is unknown how prevalent depression and anxiety are among S&E instructors and whether instructors disclose their depression or anxiety to their undergraduates. These identities are unique because they are concealable stigmatized identities (CSIs), meaning they can be kept hidden and carry negative stereotypes. To address these gaps, we surveyed 2013 S&E faculty instructors across U.S. very high research activity doctoral-granting institutions. The survey assessed the extent to which they had and revealed depression or anxiety to undergraduates, why they chose to reveal or conceal their depression or anxiety, and the benefits of revealing depression or anxiety. These items were developed based on prior studies exploring why individuals conceal or reveal CSIs including mental health conditions. Of the university S&E instructors surveyed, 23.9% (n = 482) reported having depression and 32.8% (n = 661) reported having anxiety. Instructors who are women, white, Millennials, or LGBTQ+ are more likely to report depression or anxiety than their counterparts. Very few participants revealed their depression (5.4%) or anxiety (8.3%) to undergraduates. Instructors reported concealing their depression and anxiety because they do not typically disclose to others or because it is not relevant to course content. Instructors anticipated that undergraduates would benefit from disclosure because it would normalize struggling with mental health and provide an example of someone with depression and anxiety who is successful in S&E. Despite undergraduates reporting a need for role models in academic S&E who struggle with mental health and depression/anxiety being relatively common among U.S. S&E instructors, our study found that instructors rarely reveal these identities to their undergraduates.

  6. f

    PlotTwist: A web app for plotting and annotating continuous data

    • figshare.com
    • plos.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joachim Goedhart (2023). PlotTwist: A web app for plotting and annotating continuous data [Dataset]. http://doi.org/10.1371/journal.pbio.3000581
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS Biology
    Authors
    Joachim Goedhart
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Experimental data can broadly be divided in discrete or continuous data. Continuous data are obtained from measurements that are performed as a function of another quantitative variable, e.g., time, length, concentration, or wavelength. The results from these types of experiments are often used to generate plots that visualize the measured variable on a continuous, quantitative scale. To simplify state-of-the-art data visualization and annotation of data from such experiments, an open-source tool was created with R/shiny that does not require coding skills to operate it. The freely available web app accepts wide (spreadsheet) and tidy data and offers a range of options to normalize the data. The data from individual objects can be shown in 3 different ways: (1) lines with unique colors, (2) small multiples, and (3) heatmap-style display. Next to this, the mean can be displayed with a 95% confidence interval for the visual comparison of different conditions. Several color-blind-friendly palettes are available to label the data and/or statistics. The plots can be annotated with graphical features and/or text to indicate any perturbations that are relevant. All user-defined settings can be stored for reproducibility of the data visualization. The app is dubbed PlotTwist and runs locally or online: https://huygens.science.uva.nl/PlotTwist

  7. Predictive Validity Data Set

    • figshare.com
    txt
    Updated Dec 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Abeyta (2022). Predictive Validity Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.17030021.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 18, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Antonio Abeyta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Verbal and Quantitative Reasoning GRE scores and percentiles were collected by querying the student database for the appropriate information. Any student records that were missing data such as GRE scores or grade point average were removed from the study before the data were analyzed. The GRE Scores of entering doctoral students from 2007-2012 were collected and analyzed. A total of 528 student records were reviewed. Ninety-six records were removed from the data because of a lack of GRE scores. Thirty-nine of these records belonged to MD/PhD applicants who were not required to take the GRE to be reviewed for admission. Fifty-seven more records were removed because they did not have an admissions committee score in the database. After 2011, the GRE’s scoring system was changed from a scale of 200-800 points per section to 130-170 points per section. As a result, 12 more records were removed because their scores were representative of the new scoring system and therefore were not able to be compared to the older scores based on raw score. After removal of these 96 records from our analyses, a total of 420 student records remained which included students that were currently enrolled, left the doctoral program without a degree, or left the doctoral program with an MS degree. To maintain consistency in the participants, we removed 100 additional records so that our analyses only considered students that had graduated with a doctoral degree. In addition, thirty-nine admissions scores were identified as outliers by statistical analysis software and removed for a final data set of 286 (see Outliers below). Outliers We used the automated ROUT method included in the PRISM software to test the data for the presence of outliers which could skew our data. The false discovery rate for outlier detection (Q) was set to 1%. After removing the 96 students without a GRE score, 432 students were reviewed for the presence of outliers. ROUT detected 39 outliers that were removed before statistical analysis was performed. Sample See detailed description in the Participants section. Linear regression analysis was used to examine potential trends between GRE scores, GRE percentiles, normalized admissions scores or GPA and outcomes between selected student groups. The D’Agostino & Pearson omnibus and Shapiro-Wilk normality tests were used to test for normality regarding outcomes in the sample. The Pearson correlation coefficient was calculated to determine the relationship between GRE scores, GRE percentiles, admissions scores or GPA (undergraduate and graduate) and time to degree. Candidacy exam results were divided into students who either passed or failed the exam. A Mann-Whitney test was then used to test for statistically significant differences between mean GRE scores, percentiles, and undergraduate GPA and candidacy exam results. Other variables were also observed such as gender, race, ethnicity, and citizenship status within the samples. Predictive Metrics. The input variables used in this study were GPA and scores and percentiles of applicants on both the Quantitative and Verbal Reasoning GRE sections. GRE scores and percentiles were examined to normalize variances that could occur between tests. Performance Metrics. The output variables used in the statistical analyses of each data set were either the amount of time it took for each student to earn their doctoral degree, or the student’s candidacy examination result.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach (2023). Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.xlsx [Dataset]. http://doi.org/10.3389/fgene.2020.00594.s002

Table_2_Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data.xlsx

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Pierre R. Bushel; Stephen S. Ferguson; Sreenivasa C. Ramaiahgari; Richard S. Paules; Scott S. Auerbach
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.

Search
Clear search
Close search
Google apps
Main menu