69 datasets found
  1. q

    Data from: Outside the Norm: Using Public Ecology Database Information to...

    • qubeshub.org
    Updated Oct 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carl Tyce; Lara Goudsouzian* (2023). Outside the Norm: Using Public Ecology Database Information to Teach Biostatistics [Dataset]. https://qubeshub.org/publications/4528/?v=1
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset provided by
    QUBES
    Authors
    Carl Tyce; Lara Goudsouzian*
    Description

    Biology students’ understanding of statistics is incomplete due to poor integration of these two disciplines. In some cases, students fail to learn statistics at the undergraduate level due to poor student interest and cursory teaching of concepts, highlighting a need for new and unique approaches to the teaching of statistics in the undergraduate biology curriculum. The most effective method of teaching statistics is to provide opportunities for students to apply concepts, not just learn facts. Opportunities to learn statistics also need to be prevalent throughout a student’s education to reinforce learning. The purpose of developing and implementing curriculum that integrates a topic in biology with an emphasis on statistical analysis was to improve students’ quantitative thinking skills. Our lesson focuses on the change in the richness of native species for a specified area with the aid of iNaturalist and the capacity for analysis afforded by Google Sheets. We emphasized the skills of data entry, storage, organization, curation and analysis. Students then had to report their findings, as well as discuss biases and other confounding factors. Pre- and post-lesson assessment revealed students’ quantitative thinking skills, as measured by a paired-samples t test, improved. At the end of the lesson, students had an increased understanding of basic statistical concepts, such as bias in research and making data-based claims, within the framework of biology.

    Primary Image: Website screenshot of an iNaturalist observation (Clasping Milkweed – Asclepias amplexicalis). This image is an example of a data entry on iNaturalist. The data students export from iNaturalist is made up of hundreds, or even thousands, of observations like this one. This image is licensed under Creative Commons Attribution - Share Alike 4.0 International license. Source: Observation by cassi saari, 2014.

  2. f

    Data_Sheet_4_“R” U ready?: a case study using R to analyze changes in gene...

    • frontiersin.figshare.com
    docx
    Updated Mar 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_4_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s004
    Explore at:
    docxAvailable download formats
    Dataset updated
    Mar 22, 2024
    Dataset provided by
    Frontiers
    Authors
    Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.

  3. Data from: Dataset statistics.

    • figshare.com
    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teresa Nogueira; Marie Touchon; Eduardo P. C. Rocha (2023). Dataset statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0049403.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Teresa Nogueira; Marie Touchon; Eduardo P. C. Rocha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    See Tables S1, S2 and S3 for more details.arestricted to proteins with prediction of localization; b (%) of the localized proteins.

  4. h

    walton-hard-exclude-geometry-biology-statistics-1k-1

    • huggingface.co
    Updated Nov 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yosub Shin (2025). walton-hard-exclude-geometry-biology-statistics-1k-1 [Dataset]. https://huggingface.co/datasets/yosubshin/walton-hard-exclude-geometry-biology-statistics-1k-1
    Explore at:
    Dataset updated
    Nov 29, 2025
    Authors
    Yosub Shin
    Description

    yosubshin/walton-hard-exclude-geometry-biology-statistics-1k-1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    oumi-walton-exclude-geometry-biology-statistics

    • huggingface.co
    Updated Jan 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yosub Shin (2022). oumi-walton-exclude-geometry-biology-statistics [Dataset]. https://huggingface.co/datasets/yosubshin/oumi-walton-exclude-geometry-biology-statistics
    Explore at:
    Dataset updated
    Jan 22, 2022
    Authors
    Yosub Shin
    Description

    yosubshin/oumi-walton-exclude-geometry-biology-statistics dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. d

    Data from: ASTRAL: genome-scale coalescent-based species tree estimation

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jul 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siavash Mirarab; R. Reaz; Md. S. Bayzid; T. Zimmermann; M. S. Swenson; T. Warnow (2025). ASTRAL: genome-scale coalescent-based species tree estimation [Dataset]. http://doi.org/10.5061/dryad.ht76hdrp0
    Explore at:
    Dataset updated
    Jul 25, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Siavash Mirarab; R. Reaz; Md. S. Bayzid; T. Zimmermann; M. S. Swenson; T. Warnow
    Time period covered
    Jan 1, 2023
    Description

    Species trees provide insight into basic biology, including the mechanisms of evolution and how it modifies biomolecular function and structure, biodiversity and co-evolution between genes and species. Yet, gene trees often differ from species trees, creating challenges to species tree estimation. One of the most frequent causes for conflicting topologies between gene trees and species trees is incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent. While many methods have been developed to estimate species trees from multiple genes, some which have statistical guarantees under the multi-species coalescent model, existing methods are too computationally intensive for use with genome-scale analyses or have been shown to have poor accuracy under some realistic conditions. Results: We present ASTRAL, a fast method for estimating species trees from multiple genes. ASTRAL is statistically consistent, can run on datasets with thousands of genes and has outstanding..., Availability and implementation: ASTRAL is available in open source form at https://github.com/smirarab/ASTRAL/. Datasets studied in this article are available at http://www.cs.utexas.edu/users/phylo/datasets/astral. Contact: Â warnow@illinois.edu Supplementary information: Â Supplementary data are available at Bioinformatics online., , # ASTRAL: genome-scale coalescent-based species tree estimation

    This repository includes both simulated and biological dataset.

    Description of the data and file structure

    The following datasets are used in the ASTRAL paper shown above. All these archive files include README files that describe their content.

    biological.zip:

    This file includes: 1. our estimated gene trees on alignments provided to us by authors of Song et al, 2012, PNAS, 2. our estimated species trees on the same dataset.

    We have re-analyses of two biological datasets in our paper.

    Song et al dataset

    We obtained gene alignments from the Song et al and re-estimated gene trees and species trees.

    The following files are included in mammals.zip

    • mammals-alignments.zip contains all the alignments that we obtained from Song et al.

    • mammals-genetreess.zip contains gene trees that we estimated. For each gene, we include 3 files

      • RAxML_bipartitions.final.f200 is the bestML tree with support...
  7. File S1 - Evaluation of Bias-Variance Trade-Off for Commonly Used...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xing Qiu; Rui Hu; Zhixin Wu (2023). File S1 - Evaluation of Bias-Variance Trade-Off for Commonly Used Post-Summarizing Normalization Procedures in Large-Scale Gene Expression Studies [Dataset]. http://doi.org/10.1371/journal.pone.0099380.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xing Qiu; Rui Hu; Zhixin Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Supporting tables and figures. Table S1. The impact of different effect sizes on gene selection strategies when the sample size is fixed and relatively small. Mean (STD) of true positives computed from SIMU1 with 20 repetitions are reported. Sample size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S2. The impact of different effect sizes on gene selection strategies when the sample size is fixed and relatively small. Mean (STD) of false positives computed from SIMU1 with 20 repetitions are reported. Sample size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S3. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively small. Mean (STD) of true positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S4. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively small. Mean (STD) of false positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S5. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively large. Mean (STD) of true positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S6. The impact of different sample sizes on gene selection strategies when the effect size is fixed and relatively large. Mean (STD) of false positives computed from SIMU2 with 20 repetitions are reported. Effect size: . Total number of genes: 1000. Number of differentially expressed genes: 100. Number of permutations for Nstat: 10000. The significance threshold: 0.05. Table S7. The impact of different sample sizes on gene selection strategies with simulation based on biological data. Mean (STD) of true positives computed from SIMU-BIO with 20 repetitions are reported. Total number of genes: 9005. Number of permutations for Nstat: 100000. The significance threshold: 0.05. Table S8. The impact of different sample sizes on gene selection strategies with simulation based on biological data. Mean (STD) of false positives computed from SIMU-BIO with 20 repetitions are reported. Total number of genes: 9005. Number of permutations for Nstat: 100000. The significance threshold: 0.05. Table S9. The numbers of differentially expressed genes detected by different selection strategies. Total number of genes: 9005. Number of permutations for Nstat: 100000. The significance threshold: 0.05. Figure S1. Histogram of pairwise Pearson correlation coefficients between genes computed from HYPERDIP without normalization. Number of genes: 9005. Number of arrays: 88. (PDF)

  8. d

    Estimating biodiversity using symbolic meta analysis

    • datadryad.org
    • data.niaid.nih.gov
    • +3more
    zip
    Updated Feb 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huan Lin; Julian Caley; Scott Sisson (2022). Estimating biodiversity using symbolic meta analysis [Dataset]. http://doi.org/10.5061/dryad.8cz8w9grr
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 15, 2022
    Dataset provided by
    Dryad
    Authors
    Huan Lin; Julian Caley; Scott Sisson
    Time period covered
    Jan 17, 2022
    Description

    Meta analysis

  9. 3M+ Academic Papers: Titles & Abstracts

    • kaggle.com
    zip
    Updated Sep 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Arias (2025). 3M+ Academic Papers: Titles & Abstracts [Dataset]. https://www.kaggle.com/datasets/beta3logic/3m-academic-papers-titles-and-abstracts
    Explore at:
    zip(1478156333 bytes)Available download formats
    Dataset updated
    Sep 18, 2025
    Authors
    David Arias
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Comprehensive Academic Papers Dataset: 3M+ Research Paper Titles and Abstracts

    📋 Overview

    This dataset is a comprehensive collection of over 3 million research paper titles and abstracts, curated and consolidated from multiple high-quality academic sources. The dataset provides a unified, clean, and standardized format for researchers, data scientists, and machine learning practitioners working on natural language processing, academic research analysis, and knowledge discovery tasks.

    🎯 Key Features

    • 3.6+ million scientific papers with titles and abstracts
    • Multi-domain coverage: Physics, Mathematics, Computer Science, Biology, Medicine, and more
    • Standardized format: Consistent title and abstract columns
    • Quality assured: Validated using Pydantic models and cleaned of duplicates/null values
    • Ready-to-use: Pre-processed and formatted for immediate analysis
    • Format: CSV
    • Language: English

    📊 Dataset Statistics

    MetricValue
    Total Records~3,000,000+
    Columns2 (title, abstract)
    File Size4.15 GB
    FormatCSV
    DuplicatesRemoved
    Missing ValuesRemoved

    🗂️ Dataset Structure

    cleaned_papers.csv
    ├── title (string): Scientific paper title
    └── abstract (string): Scientific paper abstract
    

    🔄 Data Processing Pipeline

    The dataset underwent a rigorous cleaning and standardization process:

    1. Data Import: Automated import from multiple sources (Kaggle API, Hugging Face)
    2. Column Standardization: Mapping various column names to consistent title and abstract format
    3. Data Validation: Pydantic model validation ensuring data quality
    4. Duplicate Removal: Advanced deduplication based on title and abstract similarity
    5. Null Value Handling: Removal of records with missing titles or abstracts
    6. Quality Assurance: Final validation and statistics generation

    💡 Use Cases

    This dataset is ideal for:

    • Natural Language Processing: Text classification, sentiment analysis, topic modeling
    • Scientific Literature Analysis: Trend analysis, domain classification, citation prediction
    • Machine Learning Research: Training language models, text summarization, information extraction
    • Academic Research: Bibliometric analysis, research trend identification
    • Educational Applications: Building search engines, recommendation systems

    🔗 Data Sources and Attribution

    This dataset consolidates academic papers from the following sources:

    Kaggle Datasets:

    1. ArXiv Scientific Research Papers Dataset by @sumitm004
    2. Cornell University ArXiv Dataset by @Cornell-University

    Hugging Face Datasets:

    1. ML-ArXiv-Papers by @CShorten
    2. ArXiv Biology by @zeroshot
    3. ArXiv Data Extended by @wrapper228
    4. Stroke PubMed Abstracts by @Gaborandi
    5. PubMed ArXiv Abstracts Data by @brainchalov
    6. Abstracts Cleaned by @Eitanli

    🔄 Update Schedule

    This dataset represents a point-in-time consolidation. Future versions may include: - Additional academic sources - Extended fields (authors, publication dates, venues) - Domain-specific subsets - Enhanced metadata

    📄 License and Usage

    Please respect the individual licenses of the source datasets. This consolidated version is provided for research and educational purposes. When using this dataset:

    1. Citation: Please cite this dataset and acknowledge the original data sources
    2. Attribution: Credit the original dataset creators listed above
    3. Compliance: Ensure compliance with individual dataset licenses
    4. Academic Use: Primarily intended for non-commercial, academic, and research purposes

    🙏 Acknowledgments

    Special thanks to all the original dataset creators and the academic communities that make their research data publicly available. This work builds upon their valuable contributions to open science and knowledge sharing.

    Keywords: academic papers, research abstracts, NLP, machine learning, text mining, scientific literature, ArXiv, PubMed, natural language processing, research dataset

  10. Group-wise linkage rates before and after adjustment for missing data.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicole Bohme Carnegie; Rui Wang; Vladimir Novitsky; Victor De Gruttola (2023). Group-wise linkage rates before and after adjustment for missing data. [Dataset]. http://doi.org/10.1371/journal.pcbi.1003430.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nicole Bohme Carnegie; Rui Wang; Vladimir Novitsky; Victor De Gruttola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A link in this analysis is defined by a difference between sequences in less than 10% of available sites.

  11. The ROC areas and the PR areas of different methods on SynTReN datasets with...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaobo Guo; Ye Zhang; Wenhao Hu; Haizhu Tan; Xueqin Wang (2023). The ROC areas and the PR areas of different methods on SynTReN datasets with noise 0.1, 0.2, 0.3, respectively. [Dataset]. http://doi.org/10.1371/journal.pone.0087446.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xiaobo Guo; Ye Zhang; Wenhao Hu; Haizhu Tan; Xueqin Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ROC areas and the PR areas of different methods on SynTReN datasets with noise 0.1, 0.2, 0.3, respectively.

  12. CORE Database Statistics.

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ann L. Griffen; Clifford J. Beall; Noah D. Firestone; Erin L. Gross; James M. DiFranco; Jori H. Hardman; Bastienne Vriesendorp; Russell A. Faust; Daniel A. Janies; Eugene J. Leys (2023). CORE Database Statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0019051.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ann L. Griffen; Clifford J. Beall; Noah D. Firestone; Erin L. Gross; James M. DiFranco; Jori H. Hardman; Bastienne Vriesendorp; Russell A. Faust; Daniel A. Janies; Eugene J. Leys
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CORE Database Statistics.

  13. Estimated probability of linkage between individuals of different groups, ,...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicole Bohme Carnegie; Rui Wang; Vladimir Novitsky; Victor De Gruttola (2023). Estimated probability of linkage between individuals of different groups, , from Mochudi data. [Dataset]. http://doi.org/10.1371/journal.pcbi.1003430.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nicole Bohme Carnegie; Rui Wang; Vladimir Novitsky; Victor De Gruttola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Mochudi
    Description

    Rates given are per 1000 pairs. A link in this analysis is defined by a difference between sequences in less than 10% of available sites.

  14. Data from: Dataset statistics.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julien Becker; Francis Maes; Louis Wehenkel (2023). Dataset statistics. [Dataset]. http://doi.org/10.1371/journal.pone.0056621.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Julien Becker; Francis Maes; Louis Wehenkel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All: proteins in which all cysteines are bonded. None: proteins with no disulfide bridges. Mix: proteins with both bonded cysteines and non-bonded cysteines. Positive: number of bonded cysteines. Negative: number of non-bonded cysteines.

  15. Characteristics of 48 fully reviewed studies.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Putri W. Novianti; Kit C. B. Roes; Marinus J. C. Eijkemans (2023). Characteristics of 48 fully reviewed studies. [Dataset]. http://doi.org/10.1371/journal.pone.0096063.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Putri W. Novianti; Kit C. B. Roes; Marinus J. C. Eijkemans
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    *Some studies used more than one classifier.

  16. Relationship between Age and Latent Factors of Beer Intake, Wine Intake,...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Cook; David A. Leon; Nikolay Kiryanov; George B. Ploubidis; Bianca L. De Stavola (2023). Relationship between Age and Latent Factors of Beer Intake, Wine Intake, Spirit Intake, and Routine Alcohol-related dysfunction among 1,705 Drinkers in the Izhevsk Family Study 1. [Dataset]. http://doi.org/10.1371/journal.pone.0063792.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sarah Cook; David A. Leon; Nikolay Kiryanov; George B. Ploubidis; Bianca L. De Stavola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Izhevsk
    Description

    aCoefficients represent standard deviation (SD) change in latent factor per 5 year increase in age.

  17. The ROC areas and the PR areas of different methods on DREAM3 challenge...

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaobo Guo; Ye Zhang; Wenhao Hu; Haizhu Tan; Xueqin Wang (2023). The ROC areas and the PR areas of different methods on DREAM3 challenge Yeast dataset with size 10, 50, 100 and Syndata, respectively. [Dataset]. http://doi.org/10.1371/journal.pone.0087446.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xiaobo Guo; Ye Zhang; Wenhao Hu; Haizhu Tan; Xueqin Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ROC areas and the PR areas of different methods on DREAM3 challenge Yeast dataset with size 10, 50, 100 and Syndata, respectively.

  18. Simulation results for different values of the smoothing parameter.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Mayr; Matthias Schmid (2023). Simulation results for different values of the smoothing parameter. [Dataset]. http://doi.org/10.1371/journal.pone.0084483.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Andreas Mayr; Matthias Schmid
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of the discriminatory power resulting from the gradient boosting approach when applying different values of the smoothing parameter . Numbers refer to to the median value and interquartile range (in parentheses) of the final on 100 simulation runs. The amount of pre-selected genes is denoted as , is the size of the training samples and cens. refers to the censoring rate. We recommend to use the value , which is also the default value of the new Cindex family for the R add-on package mboost.

  19. Estimated conditional probability of linkage between groups from Mochudi...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicole Bohme Carnegie; Rui Wang; Vladimir Novitsky; Victor De Gruttola (2023). Estimated conditional probability of linkage between groups from Mochudi data. [Dataset]. http://doi.org/10.1371/journal.pcbi.1003430.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nicole Bohme Carnegie; Rui Wang; Vladimir Novitsky; Victor De Gruttola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Mochudi
    Description

    A link in this analysis is defined by a difference between sequences in less than 10% of available sites.

  20. f

    See description of Table 4 but for the Venezuela data set.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristan A. Schneider; Ananias A. Escalante (2023). See description of Table 4 but for the Venezuela data set. [Dataset]. http://doi.org/10.1371/journal.pone.0097899.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Kristan A. Schneider; Ananias A. Escalante
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Venezuela
    Description

    N/A indicates that that the test is not applicable (cf. Analysis, section 6). Results for loci J3, J6, and U6 are not shown because the tests are also not applicable (as for locus L4).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Carl Tyce; Lara Goudsouzian* (2023). Outside the Norm: Using Public Ecology Database Information to Teach Biostatistics [Dataset]. https://qubeshub.org/publications/4528/?v=1

Data from: Outside the Norm: Using Public Ecology Database Information to Teach Biostatistics

Related Article
Explore at:
Dataset updated
Oct 26, 2023
Dataset provided by
QUBES
Authors
Carl Tyce; Lara Goudsouzian*
Description

Biology students’ understanding of statistics is incomplete due to poor integration of these two disciplines. In some cases, students fail to learn statistics at the undergraduate level due to poor student interest and cursory teaching of concepts, highlighting a need for new and unique approaches to the teaching of statistics in the undergraduate biology curriculum. The most effective method of teaching statistics is to provide opportunities for students to apply concepts, not just learn facts. Opportunities to learn statistics also need to be prevalent throughout a student’s education to reinforce learning. The purpose of developing and implementing curriculum that integrates a topic in biology with an emphasis on statistical analysis was to improve students’ quantitative thinking skills. Our lesson focuses on the change in the richness of native species for a specified area with the aid of iNaturalist and the capacity for analysis afforded by Google Sheets. We emphasized the skills of data entry, storage, organization, curation and analysis. Students then had to report their findings, as well as discuss biases and other confounding factors. Pre- and post-lesson assessment revealed students’ quantitative thinking skills, as measured by a paired-samples t test, improved. At the end of the lesson, students had an increased understanding of basic statistical concepts, such as bias in research and making data-based claims, within the framework of biology.

Primary Image: Website screenshot of an iNaturalist observation (Clasping Milkweed – Asclepias amplexicalis). This image is an example of a data entry on iNaturalist. The data students export from iNaturalist is made up of hundreds, or even thousands, of observations like this one. This image is licensed under Creative Commons Attribution - Share Alike 4.0 International license. Source: Observation by cassi saari, 2014.

Search
Clear search
Close search
Google apps
Main menu