22 datasets found
  1. Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

    • plos.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.

  2. Data from: The q–q Boxplot

    • tandf.figshare.com
    txt
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordan Rodu; Karen Kafadar (2023). The q–q Boxplot [Dataset]. http://doi.org/10.6084/m9.figshare.14749330.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Jordan Rodu; Karen Kafadar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Boxplots have become an extremely popular display of distribution summaries for collections of data, especially when we need to visualize summaries for several collections simultaneously. The whiskers in the boxplot show only the extent of the tails for most of the data (with outside values denoted separately); more detailed information about the shape of the tails, such as skewness and “weight” relative to a standard reference distribution, is much better displayed via quantile–quantile (q-q) plots. We incorporate the q-q plot’s tail information into the traditional boxplot by replacing the boxplot’s whiskers with the tails from a q-q plot, and display these tails with confidence bands for the tails that would be expected from the tails of the reference distribution. We describe the construction of the “q-q boxplot” and demonstrate its advantages over earlier proposed boxplot modifications on data from economics and neuroscience, which illustrate the q-q boxplots’ effectiveness in showing important tail behavior especially for large datasets. The package qqboxplot (an extension to the ggplot2 package) is available for the R programming language. Supplementary files for this article are available online.

  3. Plotly Dashboard Healthcare

    • kaggle.com
    zip
    Updated Jan 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A SURESH (2022). Plotly Dashboard Healthcare [Dataset]. https://www.kaggle.com/datasets/sureshmecad/plotly-dashboard-healthcare
    Explore at:
    zip(1741234 bytes)Available download formats
    Dataset updated
    Jan 4, 2022
    Authors
    A SURESH
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Data Visualization

    Content

    a. Scatter plot

      i. The webapp should allow the user to select genes from datasets and plot 2D scatter plots between 2 variables(expression/copy_number/chronos) for 
        any pair of genes.
    
      ii. The user should be able to filter and color data points using metadata information available in the file “metadata.csv”.
    
      iii. The visualization could be interactive - It would be great if the user can hover over the data-points on the plot and get the relevant information (hint - 
        visit https://plotly.com/r/, https://plotly.com/python)
    
      iv. Here is a quick reference for you. The scatter plot is between chronos score for TTBK2 gene and expression for MORC2 gene with coloring defined by
        Gender/Sex column from the metadata file.
    

    b. Boxplot/violin plot

      i. User should be able to select a gene and a variable (expression / chronos / copy_number) and generate a boxplot to display its distribution across 
       multiple categories as defined by user selected variable (a column from the metadata file)
    
     ii. Here is an example for your reference where violin plot for CHRONOS score for gene CCL22 is plotted and grouped by ‘Lineage’
    

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  4. f

    Appendix D. A figure showing the distribution of net relatedness index and...

    • datasetcatalog.nlm.nih.gov
    • wiley.figshare.com
    Updated Aug 5, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enquist, Brian J.; Thompson, Jill; Swenson, Nathan G.; Zimmerman, Jess K. (2016). Appendix D. A figure showing the distribution of net relatedness index and nearest taxon index in 25-m2 quadrats across organismal size scales in each forest dynamics plot. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001519783
    Explore at:
    Dataset updated
    Aug 5, 2016
    Authors
    Enquist, Brian J.; Thompson, Jill; Swenson, Nathan G.; Zimmerman, Jess K.
    Description

    A figure showing the distribution of net relatedness index and nearest taxon index in 25-m2 quadrats across organismal size scales in each forest dynamics plot.

  5. f

    Bayesian Estimation of Conditional Independence Graphs Improves Functional...

    • figshare.com
    pdf
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Hinne; Ronald J. Janssen; Tom Heskes; Marcel A.J. van Gerven (2023). Bayesian Estimation of Conditional Independence Graphs Improves Functional Connectivity Estimates [Dataset]. http://doi.org/10.1371/journal.pcbi.1004534
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Max Hinne; Ronald J. Janssen; Tom Heskes; Marcel A.J. van Gerven
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Functional connectivity concerns the correlated activity between neuronal populations in spatially segregated regions of the brain, which may be studied using functional magnetic resonance imaging (fMRI). This coupled activity is conveniently expressed using covariance, but this measure fails to distinguish between direct and indirect effects. A popular alternative that addresses this issue is partial correlation, which regresses out the signal of potentially confounding variables, resulting in a measure that reveals only direct connections. Importantly, provided the data are normally distributed, if two variables are conditionally independent given all other variables, their respective partial correlation is zero. In this paper, we propose a probabilistic generative model that allows us to estimate functional connectivity in terms of both partial correlations and a graph representing conditional independencies. Simulation results show that this methodology is able to outperform the graphical LASSO, which is the de facto standard for estimating partial correlations. Furthermore, we apply the model to estimate functional connectivity for twenty subjects using resting-state fMRI data. Results show that our model provides a richer representation of functional connectivity as compared to considering partial correlations alone. Finally, we demonstrate how our approach can be extended in several ways, for instance to achieve data fusion by informing the conditional independence graph with data from probabilistic tractography. As our Bayesian formulation of functional connectivity provides access to the posterior distribution instead of only to point estimates, we are able to quantify the uncertainty associated with our results. This reveals that while we are able to infer a clear backbone of connectivity in our empirical results, the data are not accurately described by simply looking at the mode of the distribution over connectivity. The implication of this is that deterministic alternatives may misjudge connectivity results by drawing conclusions from noisy and limited data.

  6. f

    Contour plot of (A) native cover and (B) species richness shows differences...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 6, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zarnetske, Phoebe L.; Seabloom, Eric W.; Hacker, Sally D.; David, Aaron S.; Biel, Reuben G.; Ruggiero, Peter (2015). Contour plot of (A) native cover and (B) species richness shows differences in distribution across across chronosequence ages and dune gradient in dunes of different Ammophila dominance. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001923591
    Explore at:
    Dataset updated
    Feb 6, 2015
    Authors
    Zarnetske, Phoebe L.; Seabloom, Eric W.; Hacker, Sally D.; David, Aaron S.; Biel, Reuben G.; Ruggiero, Peter
    Description

    Contours show areas of increasing cover or richness across both gradients. Quadrats from sites dominated by A. arenaria are shown in red, those from A. breviligulata sites shown in blue. See main text and Fig. 2 caption for additional details.

  7. Jamboree Education - Linear Regression

    • kaggle.com
    zip
    Updated Oct 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Babuli Naik (2024). Jamboree Education - Linear Regression [Dataset]. https://www.kaggle.com/datasets/babulinaik/jamboree-education-linear-regression
    Explore at:
    zip(1072508 bytes)Available download formats
    Dataset updated
    Oct 13, 2024
    Authors
    Babuli Naik
    Description

    Evaluation Criteria (100 Points):

    Define Problem Statement and perform Exploratory Data Analysis (10 points) Definition of problem (as per given problem statement with additional views) Observations on shape of data, data types of all the attributes, conversion of categorical attributes to 'category' (If required) , missing value detection, statistical summary. Univariate Analysis (distribution plots of all the continuous variable(s) barplots/countplots of all the categorical variables) Bivariate Analysis (Relationships between important variables such as workday and count, season and count, weather and count. Illustrate the insights based on EDA Comments on range of attributes, outliers of various attributes Comments on the distribution of the variables and relationship between them Comments for each univariate and bivariate plots Data Preprocessing (10 Points) Duplicate value check Missing value treatment Outlier treatment Feature engineering Data preparation for modeling Model building (10 Points) Build the Linear Regression model and comment on the model statistics Display model coefficients with column names Try out Ridge and Lasso regression Testing the assumptions of the linear regression model (50 Points) Multicollinearity check by VIF score (variables are dropped one-by-one till none has VIF>5) (10 Points) The mean of residuals is nearly zero (10 Points) Linearity of variables (no pattern in the residual plot) (10 Points) Test for Homoscedasticity (10 Points) Normality of residuals (almost bell-shaped curve in residuals distribution, points in QQ plot are almost all on the line) (10 Points) Model performance evaluation (10 Points) Metrics checked - MAE, RMSE, R2, Adj R2 Train and test performances are checked Comments on the performance measures and if there is any need to improve the model or not Actionable Insights & Recommendations (10 Points) Comments on significance of predictor variables Comments on additional data sources for model improvement, model implementation in real world, potential business benefits from improving the model (These are key to differentiating a good and an excellent solution)

    Submission Process:

    Type your insights and recommendations in the text editor. Convert your jupyter notebook into PDF (Save as PDF using Chrome browser’s Print command), upload it on our platform Optionally, you may add images/graphs in the text editor by taking screenshots or saving matplotlib graphs using plt.savefig(...). After submitting, you will not be allowed to edit your submission.

  8. d

    Barro colorado plot data

    • search.dataone.org
    Updated Nov 14, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NCEAS 9762: Condit: Geographic distribution of neotropical tree species: Pattern and process; National Center for Ecological Analysis and Synthesis; Richard condit (2013). Barro colorado plot data [Dataset]. https://search.dataone.org/view/nceas.995.1
    Explore at:
    Dataset updated
    Nov 14, 2013
    Dataset provided by
    Knowledge Network for Biocomplexity
    Authors
    NCEAS 9762: Condit: Geographic distribution of neotropical tree species: Pattern and process; National Center for Ecological Analysis and Synthesis; Richard condit
    Time period covered
    Jan 1, 1980 - Jan 1, 2010
    Area covered
    Description

    The 50-hectare permanent tree plot was established in 1980 in the tropical moist forest of Barro Colorado Island (BCI) in Gatun Lake in central Panama. Censuses have been carried out in 1981-1983, 1985, 1990, 1995, 2000, and 2005. In each census, all free-standing woody stems at least 10 mm diameter at breast height were identified, tagged, and mapped. Over 350,000 individual trees have been censused over 25 years.

    http://ctfs.si.edu/datasets/bci/ is the location for more informaion on this data.

    Species abundance for each of the censuses of all free-standing woody plants in 50 ha of forest.

    http://ctfs.si.edu/datasets/bci/abundance/ is the location for more information on this data.

    Soil Maps of Barro Colorado Island 50 ha Plot - On this webpage you can find copies of soil maps showing the estimated concentrations (mg/Kg) of base cations, extractable P, and ammonium and nitrate for the BCI 50 ha Forest Dynamics Plot. Kriged estimates for the 20 x 20 m quadrats are available in this excel file. A copy of the protocol used for the sampling of the plot, collection and chemical analysis of soil samples is available here.

    http://ctfs.si.edu/datasets/bci/soilmaps/BCIsoil.html is the location for more information on this data/

  9. Statistics on abundance, basal area, and frequency of the species in the...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rebecca Ostertag; Faith Inman-Narahari; Susan Cordell; Christian P. Giardina; Lawren Sack (2023). Statistics on abundance, basal area, and frequency of the species in the Laupāhoehoe (montane wet forest) plot, with data displayed on an absolute and a relative basis. [Dataset]. http://doi.org/10.1371/journal.pone.0103268.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Rebecca Ostertag; Faith Inman-Narahari; Susan Cordell; Christian P. Giardina; Lawren Sack
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Laupahoehoe
    Description

    Statistics on abundance, basal area, and frequency of the species in the Laupāhoehoe (montane wet forest) plot, with data displayed on an absolute and a relative basis.

  10. Flock data.

    • plos.figshare.com
    tiff
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Takayuki Niizato; Yukio-Pegio Gunji (2023). Flock data. [Dataset]. http://doi.org/10.1371/journal.pone.0035615.g003
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Takayuki Niizato; Yukio-Pegio Gunji
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We set the number of individuals to 100. The graphs in Figure3B and 3C show an average of 100 simulations, where each simulation consists of 4000 steps. The data were taken by waiting for each flock stabilized the motion. (A) The graph shows an example of a time series for changing directions. We take the absolute value of the rate of change and plot its evolution over time. The vertical line corresponds to a rate change (radian) compared with previous 10 steps. In this graph, the flock changes its direction up to 0.25 radians. (B) A graph showing the average distance between an individual and its neighbor with the topological rank. There is a roughly proportional relation between the average distance between individuals and its topological rank. (C) A graph showing the probability distribution for the nearest neighbor’s distance. The probability distribution shows an asymmetric relation around its center, 80 L. This asymmetry comes from the difference property between the repulsion of the metric interaction. The graph shows that a nearest neighbor is hard to exist within 80 L because of the repulsion zone.

  11. Multidimensional mechanics: Performance mapping of natural biological...

    • plos.figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael M. Porter; Pooya Niksiar (2023). Multidimensional mechanics: Performance mapping of natural biological systems using permutated radar charts [Dataset]. http://doi.org/10.1371/journal.pone.0204309
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Michael M. Porter; Pooya Niksiar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparing the functional performance of biological systems often requires comparing multiple mechanical properties. Such analyses, however, are commonly presented using orthogonal plots that compare N ≀ 3 properties. Here, we develop a multidimensional visualization strategy using permutated radar charts (radial, multi-axis plots) to compare the relative performance distributions of mechanical systems on a single graphic across N ≄ 3 properties. Leveraging the fact that radar charts plot data in the form of closed polygonal profiles, we use shape descriptors for quantitative comparisons. We identify mechanical property-function correlations distinctive to rigid, flexible, and damage-tolerant biological materials in the form of structural ties, beams, shells, and foams. We also show that the microstructures of dentin, bone, tendon, skin, and cartilage dictate their tensile performance, exhibiting a trade-off between stiffness and extensibility. Lastly, we compare the feeding versus singing performance of Darwin’s finches to demonstrate the potential of radar charts for multidimensional comparisons beyond mechanics of materials.

  12. fires_clean

    • kaggle.com
    Updated Jan 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muthu Chidambaram (2021). fires_clean [Dataset]. https://www.kaggle.com/chidmuthu/fires-clean/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muthu Chidambaram
    Description

    Competition (2021 Rice Science Olympiad Data Science)

    Welcome to the Data Science event for the 2021 Rice SO Invitational. Answer the following questions using ONLY the dataset provided. Each question must be supported by at least one graph that illustrates your position. Each question is worth 10 points. Points will be awarded for correctness, graph readability, and code quality so make sure the notebook you submit is precise.

    1. How has the number of fires per year changed over time? Offer a justification (doesn't have to be right, just reasonable)

    2. What is the seasonal distribution of wildfires? Justify.

    3. How many fires lasted for more than 100 days? (Plot distribution of fire length)

    4. Which month had the most lightning induced fires? Justify.

    Submission

    1. Start your work signing into Kaggle (register with Kaggle or with existing account) and clicking "New Notebook".

    2. Answer the questions in order, using plots and markdown where appropriate to explain your logic. Make sure to also have a markdown cell that clearly answers/justifies each question. A notebook with four plots lacking explanation will not receive full marks.

    3. When you are satisfied with your work, click "Save Version" in the top right corner and NAME IT WITH YOUR TEAM NUMBER AND SUBMISSION TIME (for example, Team00-4:20 PM). Make sure you select the "Save & Run All (Commit)" option to save your cell outputs (plots and tables).

    4. Share the notebook with the Kaggle username "chidmuthu". The display name is Muthu Chidambaram and the profile picture is a goose.

    4.5 PLEASE MAKE SURE YOUR TEAM NUMBER IS SOMEWHERE IN YOUR SUBMISSION OR I CAN NOT GRADE YOUR WORK!!!!

    1. All done! Your work has been submitted and will be grqaded along with the rest of your exam.

    Context

    This is a dataset that has been adapted from the Kaggle project https://www.kaggle.com/rtatman/188-million-us-wildfires for use in the 2021 Rice Science Olympiad Invitational Data Science event.

    This data publication contains a spatial database of wildfires that occurred in the United States from 1992 to 2015. It is the third update of a publication originally generated to support the national Fire Program Analysis (FPA) system. The wildfire records were acquired from the reporting systems of federal, state, and local fire organizations. The following core data elements were required for records to be included in this data publication: discovery date, final fire size, and a point location at least as precise as Public Land Survey System (PLSS) section (1-square mile grid). The data were transformed to conform, when possible, to the data standards of the National Wildfire Coordinating Group (NWCG). Basic error-checking was performed and redundant records were identified and removed, to the degree possible. The resulting product, referred to as the Fire Program Analysis fire-occurrence database (FPA FOD), includes 1.88 million geo-referenced wildfire records, representing a total of 140 million acres burned during the 24-year period.

    Content

    FOD_ID = Global unique identifier. FIRE_YEAR = Calendar year in which the fire was discovered or confirmed to exist. DISCOVERY_DATE = Date on which the fire was discovered or confirmed to exist. DISCOVERY_DOY = Day of year on which the fire was discovered or confirmed to exist. DISCOVERY_TIME = Time of day that the fire was discovered or confirmed to exist. STAT_CAUSE_CODE = Code for the (statistical) cause of the fire. STAT_CAUSE_DESCR = Description of the (statistical) cause of the fire. CONT_DATE = Date on which the fire was declared contained or otherwise controlled (mm/dd/yyyy where mm=month, dd=day, and yyyy=year). CONT_DOY = Day of year on which the fire was declared contained or otherwise controlled. CONT_TIME = Time of day that the fire was declared contained or otherwise controlled (hhmm where hh=hour, mm=minutes). FIRE_SIZE = Estimate of acres within the final perimeter of the fire. FIRE_SIZE_CLASS = Code for fire size based on the number of acres within the final fire perimeter expenditures (A=greater than 0 but less than or equal to 0.25 acres, B=0.26-9.9 acres, C=10.0-99.9 acres, D=100-299 acres, E=300 to 999 acres, F=1000 to 4999 acres, and G=5000+ acres). LATITUDE = Latitude (NAD83) for point location of the fire (decimal degrees). LONGITUDE = Longitude (NAD83) for point location of the fire (decimal degrees). OWNER_CODE = Code for primary owner or entity responsible for managing the land at the point of origin of the fire at the time of the incident. OWNER_DESCR = Name of primary owner or entity responsible for managing the land at the point of origin of the fire at the time of the incident. STATE = Two-letter alphabetic code for the state in which the fire burned (or originated), based on the nominal designation in the fire report. COUNTY = County, or equivalent, in which the fire...

  13. S

    Plot data for three tropical evergreen broadleaf shrublands in China

    • scidb.cn
    Updated Sep 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiong Gaoming; Shen Guozhen; Xu Wenting; Xie Zongqiang; Li Yuelin; Xu Yaojian; Chen Fangqing; Li Jiaxiang; cjpe (2025). Plot data for three tropical evergreen broadleaf shrublands in China [Dataset]. http://doi.org/10.57760/sciencedb.26917
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 14, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Xiong Gaoming; Shen Guozhen; Xu Wenting; Xie Zongqiang; Li Yuelin; Xu Yaojian; Chen Fangqing; Li Jiaxiang; cjpe
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    Tropical evergreen broadleaf shrublands in low mountain and hilly areas represent the most extensive shrubland type in China. We surveyed 325 plots between 2011 and 2019 across three dominant vegetation alliances to classify community types and quantify their floristic composition, structure, and distribution. Using classical Chinese vegetation classification and quantitative analysis, we identified three major shrubland types. (1) Baeckea frutescens shrublands occur from 18.4° to 25.9° N and 105.2° to 118.3° E, at altitudes of 0–1 340 m. In 101 plots, we recorded 170 vascular plant species in 52 families and 124 genera, with 79.5% of seed plant genera showing tropical affinities; evergreen broadleaf woody species accounted for 90% of total importance value. This alliance comprises 5 association groups and 7 associations. (2) Rhodomyrtus tomentosa shrublands span 18.2° to 26.2° N and 104.3° to 118.8° E, at 4–700 m altitude. In 205 plots, we recorded 373 vascular plant species across 79 families and 241 genera, with 70.2% tropical genera and 85% importance value for evergreen broadleaf woody species. This alliance includes 4 association groups and 24 associations. (3) Psidium guajava shrublands, occurring between 22.1° to 27.1° N and 101.7° to 113.8° E, at 100–900 m altitude, were recorded in 19 plots, comprising 83 vascular plant species across 38 families and 76 genera, with 71.8% tropical seed plant genera, and an 81% importance value for evergreen broadleaf woody plants. This alien-dominated alliance forms 4 association groups and 5 associations. Baeckea frutescens and Rhodomyrtus tomentosa shrublands share similar community structures and habitat preferences, representing native, natural secondary communities with overlapping ranges. In contrast, Psidium guajava shrublands, dominated by alien plants, pose a growing threat to native vegetation and require urgent monitoring. Our results offer a comprehensive baseline for understanding the structure, function, and dynamics of tropical shrubland ecosystem in China.

  14. Additional file 1: Figure S1. of Whole-genome bisulfite sequencing of...

    • springernature.figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christophe Legendre; Gerald Gooden; Kyle Johnson; Rae Martinez; Winnie Liang; Bodour Salhia (2023). Additional file 1: Figure S1. of Whole-genome bisulfite sequencing of cell-free DNA identifies signature associated with metastatic breast cancer [Dataset]. http://doi.org/10.6084/m9.figshare.c.3637367_D1.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Christophe Legendre; Gerald Gooden; Kyle Johnson; Rae Martinez; Winnie Liang; Bodour Salhia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of 120 clinically annotated plasma samples from the Komen Tissue Bank, representing 40 samples from Healthy (H) individuals, 40 from disease-free survivors (DFS), and 40 from patients with metastatic breast cancer (MBC). A) Pie chart shows distribution of involved sites of distant metastases in the MBC group. B) Vertical plot shows the number of years disease free in the DFS group. Two clusters are evident. C) Plot shows cfDNA concentrations from three independent extractions obtained after samples were pooled into three groups. D) Vertical plot showing distribution of age at diagnosis for DFS and MBC patients. Age of accrual is represented for H individuals. E) Bar graph depicting the number of samples by race, for H, DFS, and MBC. (ZIP 1052 kb)

  15. Global social media subscriptions comparison 2023

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Global social media subscriptions comparison 2023 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    Social media companies are starting to offer users the option to subscribe to their platforms in exchange for monthly fees. Until recently, social media has been predominantly free to use, with tech companies relying on advertising as their main revenue generator. However, advertising revenues have been dropping following the COVID-induced boom. As of July 2023, Meta Verified is the most costly of the subscription services, setting users back almost 15 U.S. dollars per month on iOS or Android. Twitter Blue costs between eight and 11 U.S. dollars per month and ensures users will receive the blue check mark, and have the ability to edit tweets and have NFT profile pictures. Snapchat+, drawing in four million users as of the second quarter of 2023, boasts a Story re-watch function, custom app icons, and a Snapchat+ badge.

  16. Dataset for: Comments on Schoenberg et al. (2003)

    • wiley.figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamid Ghorbani (2023). Dataset for: Comments on Schoenberg et al. (2003) [Dataset]. http://doi.org/10.6084/m9.figshare.8980403.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Wileyhttps://www.wiley.com/
    Authors
    Hamid Ghorbani
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This article comments on: Schoenberg FP, et al., On the distribution of wildfire sizes. Environmetrics. 2003;14:e605. https://doi.org/10.1002/env.605. These comments are mainly about both numerical and visual goodness-of-fit criteria, used for comparing the performance of candidate distributions for wildfire sizes. \textcolor{blue}{ First the ML estimate of the half-normal distribution and its corresponding goodness-of-fit criterions are corrected. Then the given values of the Akaike Information Criterion (AIC) for all fitted models are modified. Furthermore, some comments on the inappropriateness of naming the proposed statistic under the ‘Cram\'{e}r–von-Mises (C-vM) statistic' are given. After presenting the C-vM statistic, its values and the corresponding p-values, which show the goodness of fitted proposed distributions for describing the data, are calculated}. At the end, the asymptotic confidence bounds for the 'fitted comparison-line' in QQ-plots of two best fitted distributions, are given. Comparing these asymptotic bounds with their counterparts in \citeNP{Schoen2003}, named 'confidence bounds based on Monte Carlo simulation', bear great similarity in the position of the end points, while creating them cost relatively much cheaper computations.

  17. poncho.R

    • figshare.com
    txt
    Updated Jan 18, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cristian Dambros (2016). poncho.R [Dataset]. http://doi.org/10.6084/m9.figshare.753347.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 18, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Cristian Dambros
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Draw a histogram with all the species frequencies in a community. The distribution of species is arranged with respect to their distribution along environmental gradients and is a nice way to show species turnover along gradients. The function can also be used for visualizing nested patterns on the community structure. See Leibold and Milkkelson (2002)

  18. Inferring Regional-Scale Species Diversity from Small-Plot Censuses

    • plos.figshare.com
    text/x-python
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Harte; Justin Kitzes (2023). Inferring Regional-Scale Species Diversity from Small-Plot Censuses [Dataset]. http://doi.org/10.1371/journal.pone.0117527
    Explore at:
    text/x-pythonAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    John Harte; Justin Kitzes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Estimation of the number of species at spatial scales too large to census directly is a longstanding ecological challenge. A recent comprehensive census of tropical arthropods and trees in Panama provides a unique opportunity to apply an inference procedure for up-scaling species richness and thereby make progress toward that goal. Confidence in the underlying theory is first established by showing that the method accurately predicts the species abundance distribution for trees and arthropods, and in particular accurately captures the rare tail of the observed distributions. The rare tail is emphasized because the shape of the species-area relationship is especially influenced by the numbers of rare species. The inference procedure is then applied to estimate the total number of arthropod and tree species at spatial scales ranging from a 6000 ha forest reserve to all of Panama, with input data only from censuses in 0.04 ha plots. The analysis suggests that at the scale of the reserve there are roughly twice as many arthropod species as previously estimated. For the entirety of Panama, inferred tree species richness agrees with an accepted empirical estimate, while inferred arthropod species richness is significantly below a previous published estimate that has been criticized as too high. An extension of the procedure to estimate species richness at continental scale is proposed.

  19. Choice path distribution by genre and perspective (N = 507).

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Victoria Lagrange; Benjamin Hiskes; Claire Woodward; Binyan Li; Fritz Breithaupt (2023). Choice path distribution by genre and perspective (N = 507). [Dataset]. http://doi.org/10.1371/journal.pone.0226503.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Victoria Lagrange; Benjamin Hiskes; Claire Woodward; Binyan Li; Fritz Breithaupt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Choice path distribution by genre and perspective (N = 507).

  20. f

    Table_1_GeTallele: A Method for Analysis of DNA and RNA Allele Frequency...

    • frontiersin.figshare.com
    pdf
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piotr SƂowiƄski; Muzi Li; Paula Restrepo; Nawaf Alomran; Liam F. Spurr; Christian Miller; Krasimira Tsaneva-Atanasova; Anelia Horvath (2023). Table_1_GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions.pdf [Dataset]. http://doi.org/10.3389/fbioe.2020.01021.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    Frontiers
    Authors
    Piotr SƂowiƄski; Muzi Li; Paula Restrepo; Nawaf Alomran; Liam F. Spurr; Christian Miller; Krasimira Tsaneva-Atanasova; Anelia Horvath
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-dosage transcriptional response. Here we present a novel methodology to assess gene and chromosomal allele asymmetries and to aid in identifying genomic alterations in RNA and DNA datasets. Our approach is based on analysis of the VAF distributions in chromosomal segments (continuous multi-SNV genomic regions). In each segment we estimate variant probability, a parameter of a random process that can generate synthetic VAF samples that closely resemble the observed data. We show that variant probability is a biologically interpretable quantitative descriptor of the VAF distribution in chromosomal segments which is consistent with other approaches. To this end, we apply the proposed methodology on data from 72 samples obtained from patients with breast invasive carcinoma (BRCA) from The Cancer Genome Atlas (TCGA). We compare DNA and RNA VAF distributions from matched RNA and whole exome sequencing (WES) datasets and find that both genomic signals give very similar segmentation and estimated variant probability profiles. We also find a correlation between variant probability with copy number alterations (CNA). Finally, to demonstrate a practical application of variant probabilities, we use them to estimate tumor purity. Tumor purity estimates based on variant probabilities demonstrate good concordance with other approaches (Pearson's correlation between 0.44 and 0.76). Our evaluation suggests that variant probabilities can serve as a dependable descriptor of VAF distribution, further enabling the statistical comparison of matched DNA and RNA datasets. Finally, they provide conceptual and mechanistic insights into relations between structure of VAF distributions and genetic events. The methodology is implemented in a Matlab toolbox that provides a suite of functions for analysis, statistical assessment and visualization of Genome and Transcriptome allele frequencies distributions. GeTallele is available at: https://github.com/SlowinskiPiotr/GeTallele.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
Organization logo

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

Explore at:
312 scholarly articles cite this dataset (View in Google Scholar)
docxAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.

Search
Clear search
Close search
Google apps
Main menu