82 datasets found
  1. Iris Flower Visualization using Python

    • kaggle.com
    zip
    Updated Oct 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harsh Kashyap (2023). Iris Flower Visualization using Python [Dataset]. https://www.kaggle.com/datasets/imharshkashyap/iris-flower-visualization-using-python
    Explore at:
    zip(1307 bytes)Available download formats
    Dataset updated
    Oct 24, 2023
    Authors
    Harsh Kashyap
    Description

    The "Iris Flower Visualization using Python" project is a data science project that focuses on exploring and visualizing the famous Iris flower dataset. The Iris dataset is a well-known dataset in the field of machine learning and data science, containing measurements of four features (sepal length, sepal width, petal length, and petal width) for three different species of Iris flowers (Setosa, Versicolor, and Virginica).

    In this project, Python is used as the primary programming language along with popular libraries such as pandas, matplotlib, seaborn, and plotly. The project aims to provide a comprehensive visual analysis of the Iris dataset, allowing users to gain insights into the relationships between the different features and the distinct characteristics of each Iris species.

    The project begins by loading the Iris dataset into a pandas DataFrame, followed by data preprocessing and cleaning if necessary. Various visualization techniques are then applied to showcase the dataset's characteristics and patterns. The project includes the following visualizations:

    1. Scatter Plot: Visualizes the relationship between two features, such as sepal length and sepal width, using points on a 2D plane. Different species are represented by different colors or markers, allowing for easy differentiation.

    2. Pair Plot: Displays pairwise relationships between all features in the dataset. This matrix of scatter plots provides a quick overview of the relationships and distributions of the features.

    3. Andrews Curves: Represents each sample as a curve, with the shape of the curve representing the corresponding Iris species. This visualization technique allows for the identification of distinct patterns and separability between species.

    4. Parallel Coordinates: Plots each feature on a separate vertical axis and connects the values for each data sample using lines. This visualization technique helps in understanding the relative importance and range of each feature for different species.

    5. 3D Scatter Plot: Creates a 3D plot with three features represented on the x, y, and z axes. This visualization allows for a more comprehensive understanding of the relationships between multiple features simultaneously.

    Throughout the project, appropriate labels, titles, and color schemes are used to enhance the visualizations' interpretability. The interactive nature of some visualizations, such as the 3D Scatter Plot, allows users to rotate and zoom in on the plot for a more detailed examination.

    The "Iris Flower Visualization using Python" project serves as an excellent example of how data visualization techniques can be applied to gain insights and understand the characteristics of a dataset. It provides a foundation for further analysis and exploration of the Iris dataset or similar datasets in the field of data science and machine learning.

  2. d

    Data from: Graph Theory for Analyzing Pair-wise Data: Application to...

    • catalog.data.gov
    • gdr.openei.org
    • +3more
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Wisconsin (2025). Graph Theory for Analyzing Pair-wise Data: Application to Interferometric Synthetic Aperture Radar Data [Dataset]. https://catalog.data.gov/dataset/graph-theory-for-analyzing-pair-wise-data-application-to-interferometric-synthetic-apertur-ad16d
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    University of Wisconsin
    Description

    Graph theory is useful for estimating time-dependent model parameters via weighted least-squares using interferometric synthetic aperture radar (InSAR) data. Plotting acquisition dates (epochs) as vertices and pair-wise interferometric combinations as edges defines an incidence graph. The edge-vertex incidence matrix and the normalized edge Laplacian matrix are factors in the covariance matrix for the pair-wise data. Using empirical measures of residual scatter in the pair-wise observations, we estimate the variance at each epoch by inverting the covariance of the pair-wise data. We evaluate the rank deficiency of the corresponding least-squares problem via the edge-vertex incidence matrix. We implement our method in a MATLAB software package called GraphTreeTA available on GitHub (https://github.com/feigl/gipht). We apply temporal adjustment to the data set described in Lu et al. (2005) at Okmok volcano, Alaska, which erupted most recently in 1997 and 2008. The data set contains 44 differential volumetric changes and uncertainties estimated from interferograms between 1997 and 2004. Estimates show that approximately half of the magma volume lost during the 1997 eruption was recovered by the summer of 2003. Between June 2002 and September 2003, the estimated rate of volumetric increase is (6.2 +/- 0.6) x 10^6 m^3/yr. Our preferred model provides a reasonable fit that is compatible with viscoelastic relaxation in the five years following the 1997 eruption. Although we demonstrate the approach using volumetric rates of change, our formulation in terms of incidence graphs applies to any quantity derived from pair-wise differences, such as wrapped phase or wrapped residuals. Date of final oral examination: 05/19/2016 This thesis is approved by the following members of the Final Oral Committee: Kurt L. Feigl, Professor, Geoscience Michael Cardiff, Assistant Professor, Geoscience Clifford H. Thurber, Vilas Distinguished Professor, Geoscience

  3. The T-plot of all the miRNA-target pairs

    • figshare.com
    application/x-rar
    Updated Jun 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hao Rong (2023). The T-plot of all the miRNA-target pairs [Dataset]. http://doi.org/10.6084/m9.figshare.23592387.v1
    Explore at:
    application/x-rarAvailable download formats
    Dataset updated
    Jun 28, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Hao Rong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The T-plot of all the miRNA-target pairs plotted based on degradome density files of Malus ‘Indian summer’.

  4. IRIS FLOWER-plot images dataset

    • kaggle.com
    zip
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rijab Butt (2024). IRIS FLOWER-plot images dataset [Dataset]. https://www.kaggle.com/datasets/irijabbutt/iris-flower-plot-image-dataset
    Explore at:
    zip(191988944 bytes)Available download formats
    Dataset updated
    Jun 6, 2024
    Authors
    Rijab Butt
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    IRIS FLOWER SCATTER PLOT IMAGES DATASET

    Overview

    This dataset is derived from the well-known Iris flower dataset and contains 5000 images in PNG format. These images represent scatter plots that visually capture the relationships between different pairs of features in the Iris dataset. The original Iris dataset consists of 150 samples from three species of Iris flowers (Iris setosa, Iris versicolor, and Iris virginica), with each sample having four features: sepal length, sepal width, petal length, and petal width. The scatter plot images in this dataset provide visual insights into how these features correlate and differentiate the three species.

    Dataset Description

    • Total Images: 5000 PNG images
    • Image Format: PNG
    • Resolution: High-resolution scatter plots (resolution details can be specified)
    • Source: Derived from the Iris dataset available in Scikit-learn
    • Feature Pairs: Scatter plots are generated for all possible pairs of features (sepal length vs. sepal width, petal length vs. petal width, etc.) ##**Features of the Dataset** Diverse Visual Representations: The dataset includes scatter plots with various feature pairings, providing comprehensive visual analysis of feature relationships. Species Differentiation: Each scatter plot clearly distinguishes between the three species of Iris flowers using different colors or markers. High Quality: The images are generated with high-quality plotting techniques to ensure clarity and precision in the representation of data points. Annotations: Scatter plots are annotated with axes labels and legends to facilitate easy interpretation. Randomized Samples: The dataset contains 5000 images, which implies multiple scatter plots for each pair of features, with randomized sample selections to cover different aspects and variations within the dataset. ##**Use Cases** Data Visualization: Ideal for educational purposes to demonstrate data visualization techniques and the importance of scatter plots in exploratory data analysis. Machine Learning: Useful for training machine learning models on image recognition tasks, particularly in distinguishing between different species based on visual patterns. Research and Analysis: Can be used in research studies that require a large number of scatter plot images for testing new algorithms in image processing or pattern recognition. ##**Conclusion** The Iris Flower Scatter Plot Images Dataset provides a rich resource for visual data analysis, machine learning training, and educational purposes. By leveraging the classic Iris dataset, it offers a unique way to explore feature relationships through high-quality scatter plot images.
  5. r

    The banksia plot: a method for visually comparing point estimates and...

    • researchdata.edu.au
    • datasetcatalog.nlm.nih.gov
    • +1more
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Turner; Joanne McKenzie; Emily Karahalios; Elizabeth Korevaar (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.V2
    Explore at:
    Dataset updated
    Apr 16, 2024
    Dataset provided by
    Monash University
    Authors
    Simon Turner; Joanne McKenzie; Emily Karahalios; Elizabeth Korevaar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Companion data for the creation of a banksia plot:

    Background:

    In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.

    Methods:

    The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.

    Results:

    In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.

    Conclusions:

    The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.

    This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1

  6. d

    Data from: Evolution of virulence in heterogeneous host communities under...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Aug 31, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erik E. Osnas; Andrew P. Dobson (2011). Evolution of virulence in heterogeneous host communities under multiple trade-offs [Dataset]. http://doi.org/10.5061/dryad.mm46r
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 31, 2011
    Dataset provided by
    Dryad
    Authors
    Erik E. Osnas; Andrew P. Dobson
    Time period covered
    Aug 31, 2011
    Description

    two.species.essR text file for function used to find and characterize evolutionary singular points.AD.sim.functionR function for Adaptive Dynamics simulation in manuscript Figure 4.revised figures.8.19.11R file to make manuscript figures from data files.zip file of figure dataData shown in Figures 1 - 4 used by R file "revised figures.8.19.11.r". See ReadME file within this zip directory.Dryad.zip

  7. Supplementary File 13: Tables of estimates and Forest plots of pairwise...

    • zenodo.org
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amit Mukerji; Amit Mukerji (2024). Supplementary File 13: Tables of estimates and Forest plots of pairwise comparisons from original studies for Subgroups of Primary Outcomes [Dataset]. http://doi.org/10.5281/zenodo.13872571
    Explore at:
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Amit Mukerji; Amit Mukerji
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supplementary File #13 for Cochrane Review entitled: "Non-invasive respiratory support in preterm infants as primary mode: a network meta-analysis"

  8. Figure 1: NJ tree and pairwise distance data

    • figshare.com
    txt
    Updated Oct 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mark Farman (2021). Figure 1: NJ tree and pairwise distance data [Dataset]. http://doi.org/10.6084/m9.figshare.16892542.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 28, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Mark Farman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New Jersey
    Description

    Fig1_NJTree_data.mdsx can be opened in MEGA X and used to build a neighbor joining tree.Pairwise_distance_boxplot_MR.R can be used to generate the plot shown in Figure 1D using the box plot-distances_CC2.txt dataset.

  9. d

    Data from: Trade-offs between growth rate, tree size and lifespan of...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated May 26, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christof Bigler (2016). Trade-offs between growth rate, tree size and lifespan of mountain pine (Pinus montana) in the Swiss National Park [Dataset]. http://doi.org/10.5061/dryad.d2680
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 26, 2016
    Dataset provided by
    Dryad
    Authors
    Christof Bigler
    Time period covered
    Jul 6, 2015
    Area covered
    Swiss National Park, Switzerland, canton of Grisons
    Description

    A within-species trade-off between growth rates and lifespan has been observed across different taxa of trees, however, there is some uncertainty whether this trade-off also applies to shade-intolerant tree species. The main objective of this study was to investigate the relationships between radial growth, tree size and lifespan of shade-intolerant mountain pines. For 200 dead standing mountain pines (Pinus montana) located along gradients of aspect, slope steepness and elevation in the Swiss National Park, radial annual growth rates and lifespan were reconstructed. While early growth (i.e. mean tree-ring width over the first 50 years) correlated positively with diameter at the time of tree death, a negative correlation resulted with lifespan, i.e. rapidly growing mountain pines face a trade-off between reaching a large diameter at the cost of early tree death. Slowly growing mountain pines may reach a large diameter and a long lifespan, but risk to die young at a small size. Early gro...

  10. Table of comparisons performed, and the important structural features...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Wills; François G. Meyer (2023). Table of comparisons performed, and the important structural features therein. [Dataset]. http://doi.org/10.1371/journal.pone.0228728.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Peter Wills; François G. Meyer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    G(n, p) indicates the Erdős-Rényi uncorrelated random graph, SBM is the stochastic blockmodel, PA is the preferential attachment model, CM is the degree matched configuration model, and WS is the Watts-Strogatz model.

  11. f

    Marer tR–m/z ion pairs in the S-plot.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 22, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wu, Labin; Zheng, Sihao; Wang, Zenghui; Huang, Linfang; Jiang, Xue (2014). Marer tR–m/z ion pairs in the S-plot. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001262168
    Explore at:
    Dataset updated
    May 22, 2014
    Authors
    Wu, Labin; Zheng, Sihao; Wang, Zenghui; Huang, Linfang; Jiang, Xue
    Description

    Marer tR–m/z ion pairs in the S-plot.

  12. Stordalen Mire mire-wide survey: Pairwise geographic distances between plots...

    • zenodo.org
    bin, csv
    Updated Apr 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suzanne Hodgkins; Suzanne Hodgkins (2025). Stordalen Mire mire-wide survey: Pairwise geographic distances between plots [Dataset]. http://doi.org/10.5281/zenodo.15048219
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Apr 9, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Suzanne Hodgkins; Suzanne Hodgkins
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pairwise geographic distances (m) between mire-wide plots in Stordalen Mire, northern Sweden.

    Distances are in the file Mirewide_Plots_distances-m.csv.

    This file was generated with the script Mirewide_Plots_Distances.R, using Mirewide_Plots_GPS.csv as input, and geosphere package version 1.5-10.

    Details of the plots, including latitude, longitude, and vegetation cover, are in the dataset "Stordalen Mire mire-wide survey: Vegetation cover" (https://doi.org/10.5281/zenodo.15048198). The latitude & longitude provided in that dataset represent more precise versions of the coordinates in Mirewide_Plots_GPS.csv (which also omits plot 8); the coordinates are otherwise identical in both datasets.

    FUNDING:

    • National Aeronautics and Space Administration, Interdisciplinary Science program: From Archaea to the Atmosphere (award # NNX17AK10G).
    • National Science Foundation, Biology Integration Institutes Program: EMERGE Biology Integration Institute (award # 2022070).
    • United States Department of Energy Office of Biological and Environmental Research, Genomic Science Program: The IsoGenie Project (grant #s DE-SC0004632, DE-SC0010580, and DE-SC0016440).
    • We thank the Swedish Polar Research Secretariat and SITES for the support of the work done at the Abisko Scientific Research Station. SITES is supported by the Swedish Research Council's grant 4.3-2021-00164.
  13. Identification of Putative Biomarkers for the Early Stage of Porcine...

    • figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Won-Young Lee; Jeong Tae Do; Chankyu Park; Jin Hoi Kim; Hak-Jae Chung; Kyung-Woon Kim; Chang-Hyun Gil; Nam-Hyung Kim; Hyuk Song (2023). Identification of Putative Biomarkers for the Early Stage of Porcine Spermatogonial Stem Cells Using Next-Generation Sequencing [Dataset]. http://doi.org/10.1371/journal.pone.0147298
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Won-Young Lee; Jeong Tae Do; Chankyu Park; Jin Hoi Kim; Hak-Jae Chung; Kyung-Woon Kim; Chang-Hyun Gil; Nam-Hyung Kim; Hyuk Song
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To identify putative biomarkers of porcine spermatogonial stem cells (pSSCs), total RNA sequencing (RNA-seq) analysis was performed on 5- and 180-day-old porcine testes and on pSSC colonies that were established under low temperature culture conditions as reported previously. In total, 10,184 genes were selected using Cufflink software, followed by a logarithm and quantile normalization of the pairwise scatter plot. The correlation rates of pSSCs compared to 5- and 180-day-old testes were 0.869 and 0.529, respectively and that between 5- and 180-day-old testes was 0.580. Hierarchical clustering data revealed that gene expression patterns of pSSCs were similar to 5-day-old testis. By applying a differential expression filter of four fold or greater, 607 genes were identified between pSSCs and 5-day-old testis, and 2118 genes were identified between the 5- and 180-day-old testes. Among these differentially expressed genes, 293 genes were upregulated and 314 genes were downregulated in the 5-day-old testis compared to pSSCs, and 1106 genes were upregulated and 1012 genes were downregulated in the 180-day-old testis compared to the 5-day-old testis. The following genes upregulated in pSSCs compared to 5-day-old testes were selected for additional analysis: matrix metallopeptidase 9 (MMP9), matrix metallopeptidase 1 (MMP1), glutathione peroxidase 1 (GPX1), chemokine receptor 1 (CCR1), insulin-like growth factor binding protein 3 (IGFBP3), CD14, CD209, and Kruppel-like factor 9 (KLF9). Expression levels of these genes were evaluated in pSSCs and in 5- and 180-day-old porcine testes. In addition, immunohistochemistry analysis confirmed their germ cell-specific expression in 5- and 180-day-old testes. These finding may not only be useful in facilitating the enrichment and sorting of porcine spermatogonia, but may also be useful in the study of the early stages of spermatogenic meiosis.

  14. e

    Florida mangrove saltmarsh reference surface soils

    • portal.edirepository.org
    csv, jpeg
    Updated Jun 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Lewis (2021). Florida mangrove saltmarsh reference surface soils [Dataset]. http://doi.org/10.6073/pasta/0e08cbe07c84488cb7b9dd16669946d0
    Explore at:
    csv(5972 byte), jpeg(1393711 byte)Available download formats
    Dataset updated
    Jun 24, 2021
    Dataset provided by
    EDI
    Authors
    David Lewis
    Time period covered
    2011
    Area covered
    Variables measured
    Site, SOM_ar, SOM_gv, VegZone, Fines_ar, Fines_gv, Plotpair, TotalN_a, Elevation, TotalC_ar, and 20 more
    Description

    Site description.

     This data package consists of data obtained from sampling surface soil (the 0-7.6 cm depth profile) in black mangrove (Avicennia germinans) dominated forest and black needlerush (Juncus roemerianus) saltmarsh along the Gulf of Mexico coastline in peninsular west-central Florida, USA. This location has a subtropical climate with mean daily temperatures ranging from 15.4 °C in January to 27.8 °C in August, and annual precipitation of 1336 mm. Precipitation falls as rain primarily between June and September. Tides are semi-diurnal, with 0.57 m median amplitudes during the year preceding sampling (U.S. NOAA National Ocean Service, Clearwater Beach, Florida, station 8726724). Sea-level rise is 4.0 ± 0.6 mm per year (1973-2020 trend, mean ± 95 % confidence interval, NOAA NOS Clearwater Beach station). The A. germinans mangrove zone is either adjacent to water or fringed on the seaward side by a narrow band of red mangrove (Rhizophora mangle). A near-monoculture of J. roemerianus is often adjacent to and immediately landward of the A. germinans zone. The transition from the mangrove to the J. roemerianus zone is variable in our study area. An abrupt edge between closed-canopy mangrove and J. roemerianus monoculture may extend for up to several hundred meters in some locations, while other stretches of ecotone present a gradual transition where smaller, widely spaced trees are interspersed into the herbaceous marsh. Juncus roemerianus then extends landward to a high marsh patchwork of succulent halophytes (including Salicornia bigellovi, Sesuvium sp., and Batis maritima), scattered dwarf mangrove, and salt pans, followed in turn by upland vegetation that includes Pinus sp. and Serenoa repens.
    
     Field design and sample collection.
    
     We established three study sites spaced at approximately 5 km intervals along the western coastline of the central Florida peninsula. The sites consisted of the Salt Springs (28.3298°, -82.7274°), Energy Marine Center (28.2903°, -82.7278°), and Green Key (28.2530°, -82.7496°) sites on the Gulf of Mexico coastline in Pasco County, Florida, USA. At each site, we established three plot pairs, each consisting of one saltmarsh plot and one mangrove plot. Plots were 50 m^2 in size. Plots pairs within a site were separated by 230-1070 m, and the mangrove and saltmarsh plots composing a pair were 70-170 m apart. All plot pairs consisted of directly adjacent patches of mangrove forest and J. roemerianus saltmarsh, with the mangrove forests exhibiting a closed canopy and a tree architecture (height 4-6 m, crown width 1.5-3 m). Mangrove plots were located at approximately the midpoint between the seaward edge (water-mangrove interface) and landward edge (mangrove-marsh interface) of the mangrove zone. Saltmarsh plots were located 20-25 m away from any mangrove trees and into the J. roemerianus zone (i.e., landward from the mangrove-marsh interface). Plot pairs were coarsely similar in geomorphic setting, as all were located on the Gulf of Mexico coastline, rather than within major sheltering formations like Tampa Bay, and all plot pairs fit the tide-dominated domain of the Woodroffe classification (Woodroffe, 2002, "Coasts: Form, Process and Evolution", Cambridge University Press), given their conspicuous semi-diurnal tides. There was nevertheless some geomorphic variation, as some plot pairs were directly open to the Gulf of Mexico while others sat behind keys and spits or along small tidal creeks. Our use of a plot-pair approach is intended to control for this geomorphic variation. Plot center elevations (cm above mean sea level, NAVD 88) were estimated by overlaying the plot locations determined with a global positioning system (Garmin GPS 60, Olathe, KS, USA) on a LiDAR-derived bare-earth digital elevation model (Dewberry, Inc., 2019). The digital elevation model had a vertical accuracy of ± 10 cm (95 % CI) and a horizontal accuracy of ± 116 cm (95 % CI).
    
     Soil samples were collected via coring at low tide in June 2011. From each plot, we collected a composite soil sample consisting of three discrete 5.1 cm diameter soil cores taken at equidistant points to 7.6 cm depth. Cores were taken by tapping a sleeve into the soil until its top was flush with the soil surface, sliding a hand under the core, and lifting it up. Cores were then capped and transferred on ice to our laboratory at the University of South Florida (Tampa, Florida, USA), where they were combined in plastic zipper bags, and homogenized by hand into plot-level composite samples on the day they were collected. A damp soil subsample was immediately taken from each composite sample to initiate 1 y incubations for determination of active C and N (see below). The remainder of each composite sample was then placed in a drying oven (60 °C) for 1 week with frequent mixing of the soil to prevent aggregation and liberate water. Organic wetland soils are sometimes dried at 70 °C
    
  15. Plotly Dashboard Healthcare

    • kaggle.com
    zip
    Updated Jan 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A SURESH (2022). Plotly Dashboard Healthcare [Dataset]. https://www.kaggle.com/datasets/sureshmecad/plotly-dashboard-healthcare
    Explore at:
    zip(1741234 bytes)Available download formats
    Dataset updated
    Jan 4, 2022
    Authors
    A SURESH
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Data Visualization

    Content

    a. Scatter plot

      i. The webapp should allow the user to select genes from datasets and plot 2D scatter plots between 2 variables(expression/copy_number/chronos) for 
        any pair of genes.
    
      ii. The user should be able to filter and color data points using metadata information available in the file “metadata.csv”.
    
      iii. The visualization could be interactive - It would be great if the user can hover over the data-points on the plot and get the relevant information (hint - 
        visit https://plotly.com/r/, https://plotly.com/python)
    
      iv. Here is a quick reference for you. The scatter plot is between chronos score for TTBK2 gene and expression for MORC2 gene with coloring defined by
        Gender/Sex column from the metadata file.
    

    b. Boxplot/violin plot

      i. User should be able to select a gene and a variable (expression / chronos / copy_number) and generate a boxplot to display its distribution across 
       multiple categories as defined by user selected variable (a column from the metadata file)
    
     ii. Here is an example for your reference where violin plot for CHRONOS score for gene CCL22 is plotted and grouped by ‘Lineage’
    

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  16. d

    Data from: Florida mangrove saltmarsh reference surface soils

    • search.dataone.org
    Updated Jun 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Bruce Lewis (2021). Florida mangrove saltmarsh reference surface soils [Dataset]. https://search.dataone.org/view/https%3A%2F%2Fpasta.lternet.edu%2Fpackage%2Fmetadata%2Feml%2Fedi%2F860%2F1
    Explore at:
    Dataset updated
    Jun 24, 2021
    Dataset provided by
    Environmental Data Initiative
    Authors
    David Bruce Lewis
    Time period covered
    Jan 1, 2011
    Area covered
    Variables measured
    Site, SOM_ar, SOM_gv, VegZone, Fines_ar, Fines_gv, Plotpair, TotalN_a, Elevation, TotalC_ar, and 20 more
    Description

    Site description. This data package consists of data obtained from sampling surface soil (the 0-7.6 cm depth profile) in black mangrove (Avicennia germinans) dominated forest and black needlerush (Juncus roemerianus) saltmarsh along the Gulf of Mexico coastline in peninsular west-central Florida, USA. This location has a subtropical climate with mean daily temperatures ranging from 15.4 °C in January to 27.8 °C in August, and annual precipitation of 1336 mm. Precipitation falls as rain primarily between June and September. Tides are semi-diurnal, with 0.57 m median amplitudes during the year preceding sampling (U.S. NOAA National Ocean Service, Clearwater Beach, Florida, station 8726724). Sea-level rise is 4.0 ± 0.6 mm per year (1973-2020 trend, mean ± 95 % confidence interval, NOAA NOS Clearwater Beach station). The A. germinans mangrove zone is either adjacent to water or fringed on the seaward side by a narrow band of red mangrove (Rhizophora mangle). A near-monoculture of J. roemerianus is often adjacent to and immediately landward of the A. germinans zone. The transition from the mangrove to the J. roemerianus zone is variable in our study area. An abrupt edge between closed-canopy mangrove and J. roemerianus monoculture may extend for up to several hundred meters in some locations, while other stretches of ecotone present a gradual transition where smaller, widely spaced trees are interspersed into the herbaceous marsh. Juncus roemerianus then extends landward to a high marsh patchwork of succulent halophytes (including Salicornia bigellovi, Sesuvium sp., and Batis maritima), scattered dwarf mangrove, and salt pans, followed in turn by upland vegetation that includes Pinus sp. and Serenoa repens. Field design and sample collection. We established three study sites spaced at approximately 5 km intervals along the western coastline of the central Florida peninsula. The sites consisted of the Salt Springs (28.3298°, -82.7274°), Energy Marine Center (28.2903°, -82.7278°), and Green Key (28.2530°, -82.7496°) sites on the Gulf of Mexico coastline in Pasco County, Florida, USA. At each site, we established three plot pairs, each consisting of one saltmarsh plot and one mangrove plot. Plots were 50 m^2 in size. Plots pairs within a site were separated by 230-1070 m, and the mangrove and saltmarsh plots composing a pair were 70-170 m apart. All plot pairs consisted of directly adjacent patches of mangrove forest and J. roemerianus saltmarsh, with the mangrove forests exhibiting a closed canopy and a tree architecture (height 4-6 m, crown width 1.5-3 m). Mangrove plots were located at approximately the midpoint between the seaward edge (water-mangrove interface) and landward edge (mangrove-marsh interface) of the mangrove zone. Saltmarsh plots were located 20-25 m away from any mangrove trees and into the J. roemerianus zone (i.e., landward from the mangrove-marsh interface). Plot pairs were coarsely similar in geomorphic setting, as all were located on the Gulf of Mexico coastline, rather than within major sheltering formations like Tampa Bay, and all plot pairs fit the tide-dominated domain of the Woodroffe classification (Woodroffe, 2002, "Coasts: Form, Process and Evolution", Cambridge University Press), given their conspicuous semi-diurnal tides. There was nevertheless some geomorphic variation, as some plot pairs were directly open to the Gulf of Mexico while others sat behind keys and spits or along small tidal creeks. Our use of a plot-pair approach is intended to control for this geomorphic variation. Plot center elevations (cm above mean sea level, NAVD 88) were estimated by overlaying the plot locations determined with a global positioning system (Garmin GPS 60, Olathe, KS, USA) on a LiDAR-derived bare-earth digital elevation model (Dewberry, Inc., 2019). The digital elevation model had a vertical accuracy of ± 10 cm (95 % CI) and a horizontal accuracy of ± 116 cm (95 % CI). Soil samples were collected via coring at low tide in June 2011. From each plot, we collected a composite soil sample consisting of three discrete 5.1 cm diameter soil cores taken at equidistant points to 7.6 cm depth. Cores were taken by tapping a sleeve into the soil until its top was flush with the soil surface, sliding a hand under the core, and lifting it up. Cores were then capped and transferred on ice to our laboratory at the University of South Florida (Tampa, Florida, USA), where they were combined in plastic zipper bags, and homogenized by hand into plot-level composite samples on the day they were collected. A damp soil subsample was immediately taken from each composite sample to initiate 1 y incubations for determination of active C and N (see below). The remainder of each composite sample was then placed in a drying oven (60 °C) for 1 week with frequent m... Visit https://dataone.org/datasets/https%3A%2F%2Fpasta.lternet.edu%2Fpackage%2Fmetadata%2Feml%2Fedi%2F860%2F1 for complete metadata about this dataset.

  17. Runtimes for distance various distance measures, for graphs of size n = 100...

    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Wills; François G. Meyer (2023). Runtimes for distance various distance measures, for graphs of size n = 100 and n = 300. [Dataset]. http://doi.org/10.1371/journal.pone.0228728.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Peter Wills; François G. Meyer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Each distance is calculated N = 500 times. Each sample generates two Erdős-Rényi random graphs with parameter p = 0.15, and times the calculation of the distance between the two graphs. All distances are implemented in the NetComp library, which can be found on GitHub at [75].

  18. d

    Data from: Contrasting ecological mechanisms mediate the impact of land...

    • search.dataone.org
    • datadryad.org
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florent Noulekoun; Sylvanus Mensah; HyungSub Kim; Juliano Sènanmi Hermann Houndonougbo; Michael Mensah; Woo Kyun Lee; Yowhan Son; Asia Khamzina (2024). Contrasting ecological mechanisms mediate the impact of land conversion on ecosystem multifunctionality [Dataset]. http://doi.org/10.5061/dryad.7wm37pw3n
    Explore at:
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    Florent Noulekoun; Sylvanus Mensah; HyungSub Kim; Juliano Sènanmi Hermann Houndonougbo; Michael Mensah; Woo Kyun Lee; Yowhan Son; Asia Khamzina
    Description

    Land use/cover (LULC) changes have unequivocally affected biodiversity and ecosystem functioning, with enormous repercussions for human well-being. However, the mechanistic ecological mechanisms underlying the impact of land conversion on ecosystem multifunctionality (EMF) remain insufficiently examined from the perspective of multiple biodiversity attributes in dryland regions with increasing deforestation rates. We investigated how the conversion of natural forests and savannas to agroforestry parklands alters the relationships between multiple biodiversity attributes (taxonomic, functional, phylogenetic, and structural) and EMF, while accounting for the effects of environmental factors in the dryland landscapes in Benin. We used forest inventory data from 145 plots spanning forests, savannas, and agroforestry parklands and assessed the implications of three land conversion scenarios. We quantified EMF using eight functions that are central to primary productivity and nutrient cycling..., Data were collected across three dominant land use (LU) types in the Sudanian and Sudano–Guinean zones in Benin, West Africa. The LU types included forests, savannas and agroforestry parklands. Vegetation and soil data were collected from 145 circular plots of 0.1 ha each. Within each plot, the floristic inventory consisted of counting and measuring the diameter at breast height (DBH, cm) and height (H, m) of all living trees with DBH > 5 cm. Leaf samples were collected from 5–16 individual trees of abundant species across the sampling plots to determine their dry matter content (mg g-1) andnitrogen content (%). Soil samples were collected at a 0–20 cm depth from the center of four subplots of 0.01 ha each that were installed within the main plot. The soil samples were analyzed for organic carbon (%), total nitrogen (%), total phosphorus (%), and available phosphorus (%) content. Litter samples were collected from four smaller plots of 1 m radius that were established within the four..., , # Contrasting ecological mechanisms mediate the impact of land conversion on ecosystem multifunctionality

    https://doi.org/10.5061/dryad.7wm37pw3n

    Description of the data and file structure

    The zipped file in Dryad contains the data necessary to reproduce the statistical analyses published in the manuscript "Contrasting ecological mechanisms mediate the impact of land conversion on ecosystem multifunctionality" in Functional Ecology by Noulèkoun et al.

    The file includes 3 files, whose content is described below.

    1- Main database "data_FE_Noulekoun_et_al" This is .csv document that contains all the variables used in the statistical analysis are displayed along with their values per plot. The names of the variables are abbreviated in this document and their description is provided in the second file entitled "Description_abbreviations_FE_Noulekoun" (see also Table below). The dataset does not contain any missing values.

    2. "Description_a...

  19. Z

    CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lele Cao; Vilhelm von Ehrenheim; Mark Granroth-Wilding; Richard Anselmo Stahl; Drew McCornack; Armin Catovic; Dhiana Deva Cavacanti Rocha (2024). CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company Similarity Quantification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7957401
    Explore at:
    Dataset updated
    Jun 4, 2024
    Dataset provided by
    EQT
    Authors
    Lele Cao; Vilhelm von Ehrenheim; Mark Granroth-Wilding; Richard Anselmo Stahl; Drew McCornack; Armin Catovic; Dhiana Deva Cavacanti Rocha
    Description

    CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.

    Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.

    Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.

    Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:

    Similarity Prediction (SP). To assess the accuracy of pairwise company similarity, we constructed the SP evaluation set comprising 3,219 pairs of companies that are labeled either as positive (similar, denoted by "1") or negative (dissimilar, denoted by "0"). Of these pairs, 1,522 are positive and 1,697 are negative.

    Competitor Retrieval (CR). Each sample contains one target company and one of its direct competitors. It contains 76 distinct target companies, each of which has 5.3 competitors annotated in average. For a given target company A with N direct competitors in this CR evaluation set, we expect a competent method to retrieve all N competitors when searching for similar companies to A.

    Similarity Ranking (SR) is designed to assess the ability of any method to rank candidate companies (numbered 0 and 1) based on their similarity to a query company. Paid human annotators, with backgrounds in engineering, science, and investment, were tasked with determining which candidate company is more similar to the query company. It resulted in an evaluation set comprising 1,856 rigorously labeled ranking questions. We retained 20% (368 samples) of this set as a validation set for model development.

    Edge Prediction (EP) evaluates a model's ability to predict future or missing relationships between companies, providing forward-looking insights for investment professionals. The EP dataset, derived (and sampled) from new edges collected between April 6, 2023, and May 25, 2024, includes 40,000 samples, with edges not present in the pre-existing CompanyKG (a snapshot up until April 5, 2023).

    Background and Motivation

    In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.

    While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.

    In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.

    However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.

    Source Code and Tutorial:https://github.com/llcresearch/CompanyKG2

    Paper: to be published

  20. Y

    Citation Network Graph

    • shibatadb.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu, Citation Network Graph [Dataset]. https://www.shibatadb.com/article/JRRwPFtc
    Explore at:
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Description

    Network of 41 papers and 83 citation links related to "Inferring Efficient Weights from Pairwise Comparison Matrices".

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Harsh Kashyap (2023). Iris Flower Visualization using Python [Dataset]. https://www.kaggle.com/datasets/imharshkashyap/iris-flower-visualization-using-python
Organization logo

Iris Flower Visualization using Python

Data Science Project

Explore at:
zip(1307 bytes)Available download formats
Dataset updated
Oct 24, 2023
Authors
Harsh Kashyap
Description

The "Iris Flower Visualization using Python" project is a data science project that focuses on exploring and visualizing the famous Iris flower dataset. The Iris dataset is a well-known dataset in the field of machine learning and data science, containing measurements of four features (sepal length, sepal width, petal length, and petal width) for three different species of Iris flowers (Setosa, Versicolor, and Virginica).

In this project, Python is used as the primary programming language along with popular libraries such as pandas, matplotlib, seaborn, and plotly. The project aims to provide a comprehensive visual analysis of the Iris dataset, allowing users to gain insights into the relationships between the different features and the distinct characteristics of each Iris species.

The project begins by loading the Iris dataset into a pandas DataFrame, followed by data preprocessing and cleaning if necessary. Various visualization techniques are then applied to showcase the dataset's characteristics and patterns. The project includes the following visualizations:

1. Scatter Plot: Visualizes the relationship between two features, such as sepal length and sepal width, using points on a 2D plane. Different species are represented by different colors or markers, allowing for easy differentiation.

2. Pair Plot: Displays pairwise relationships between all features in the dataset. This matrix of scatter plots provides a quick overview of the relationships and distributions of the features.

3. Andrews Curves: Represents each sample as a curve, with the shape of the curve representing the corresponding Iris species. This visualization technique allows for the identification of distinct patterns and separability between species.

4. Parallel Coordinates: Plots each feature on a separate vertical axis and connects the values for each data sample using lines. This visualization technique helps in understanding the relative importance and range of each feature for different species.

5. 3D Scatter Plot: Creates a 3D plot with three features represented on the x, y, and z axes. This visualization allows for a more comprehensive understanding of the relationships between multiple features simultaneously.

Throughout the project, appropriate labels, titles, and color schemes are used to enhance the visualizations' interpretability. The interactive nature of some visualizations, such as the 3D Scatter Plot, allows users to rotate and zoom in on the plot for a more detailed examination.

The "Iris Flower Visualization using Python" project serves as an excellent example of how data visualization techniques can be applied to gain insights and understand the characteristics of a dataset. It provides a foundation for further analysis and exploration of the Iris dataset or similar datasets in the field of data science and machine learning.

Search
Clear search
Close search
Google apps
Main menu