100+ datasets found
  1. d

    Data from: PISA Data Analysis Manual: SPSS, Second Edition

    • catalog.data.gov
    Updated Mar 30, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of State (2021). PISA Data Analysis Manual: SPSS, Second Edition [Dataset]. https://catalog.data.gov/dataset/pisa-data-analysis-manual-spss-second-edition
    Explore at:
    Dataset updated
    Mar 30, 2021
    Dataset provided by
    U.S. Department of State
    Description

    The OECD Programme for International Student Assessment (PISA) surveys collected data on students’ performances in reading, mathematics and science, as well as contextual information on students’ background, home characteristics and school factors which could influence performance. This publication includes detailed information on how to analyse the PISA data, enabling researchers to both reproduce the initial results and to undertake further analyses. In addition to the inclusion of the necessary techniques, the manual also includes a detailed account of the PISA 2006 database and worked examples providing full syntax in SPSS.

  2. H

    Political Analysis Using R: Example Code and Data, Plus Data for Practice...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ARKOTI
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Jamie Monogan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.

  3. f

    Data from: Teaching and Learning Data Visualization: Ideas and Assignments

    • tandf.figshare.com
    • figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deborah Nolan; Jamis Perrett (2023). Teaching and Learning Data Visualization: Ideas and Assignments [Dataset]. http://doi.org/10.6084/m9.figshare.1627940.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Deborah Nolan; Jamis Perrett
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This article discusses how to make statistical graphics a more prominent element of the undergraduate statistics curricula. The focus is on several different types of assignments that exemplify how to incorporate graphics into a course in a pedagogically meaningful way. These assignments include having students deconstruct and reconstruct plots, copy masterful graphs, create one-minute visual revelations, convert tables into “pictures,” and develop interactive visualizations, for example, with the virtual earth as a plotting canvas. In addition to describing the goals and details of each assignment, we also discuss the broader topic of graphics and key concepts that we think warrant inclusion in the statistics curricula. We advocate that more attention needs to be paid to this fundamental field of statistics at all levels, from introductory undergraduate through graduate level courses. With the rapid rise of tools to visualize data, for example, Google trends, GapMinder, ManyEyes, and Tableau, and the increased use of graphics in the media, understanding the principles of good statistical graphics, and having the ability to create informative visualizations is an ever more important aspect of statistics education. Supplementary materials containing code and data for the assignments are available online.

  4. w

    Data from: Sensory data analysis by example with R

    • workwithdata.com
    Updated Jan 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2022). Sensory data analysis by example with R [Dataset]. https://www.workwithdata.com/object/sensory-data-analysis-by-example-with-r-book-by-sebastien-le-0000
    Explore at:
    Dataset updated
    Jan 5, 2022
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sensory data analysis by example with R is a book. It was written by Sébastien Lê and published by Chapman&Hall/CRC in 2014.

  5. f

    Data from: An Automated Data Analysis Pipeline for GC−TOF−MS Metabonomics...

    • acs.figshare.com
    • figshare.com
    txt
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenxin Jiang; Yunping Qiu; Yan Ni; Mingming Su; Wei Jia; Xiuxia Du (2023). An Automated Data Analysis Pipeline for GC−TOF−MS Metabonomics Studies [Dataset]. http://doi.org/10.1021/pr1007703.s009
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    ACS Publications
    Authors
    Wenxin Jiang; Yunping Qiu; Yan Ni; Mingming Su; Wei Jia; Xiuxia Du
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Recent technological advances have made it possible to carry out high-throughput metabonomics studies using gas chromatography coupled with time-of-flight mass spectrometry. Large volumes of data are produced from these studies and there is a pressing need for algorithms that can efficiently process and analyze data in a high-throughput fashion as well. We present an Automated Data Analysis Pipeline (ADAP) that has been developed for this purpose. ADAP consists of peak detection, deconvolution, peak alignment, and library search. It allows data to flow seamlessly through the analysis steps without any human intervention and features two novel algorithms in the analysis. Specifically, clustering is successfully applied in deconvolution to resolve coeluting compounds that are very common in complex samples and a two-phase alignment process has been implemented to enhance alignment accuracy. ADAP is written in standard C++ and R and uses parallel computing via Message Passing Interface for fast peak detection and deconvolution. ADAP has been applied to analyze both mixed standards samples and serum samples and identified and quantified metabolites successfully. ADAP is available at http://www.du-lab.org.

  6. m

    Raw data outputs 1-18

    • bridges.monash.edu
    • researchdata.edu.au
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie (2023). Raw data outputs 1-18 [Dataset]. http://doi.org/10.26180/21259491.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Monash University
    Authors
    Abbas Salavaty Hosein Abadi; Sara Alaei; Mirana Ramialison; Peter Currie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancers’ (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.

  7. f

    Data from: MS-DAP Platform for Downstream Data Analysis of Label-Free...

    • acs.figshare.com
    xlsx
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Frank Koopmans; Ka Wan Li; Remco V. Klaassen; August B. Smit (2023). MS-DAP Platform for Downstream Data Analysis of Label-Free Proteomics Uncovers Optimal Workflows in Benchmark Data Sets and Increased Sensitivity in Analysis of Alzheimer’s Biomarker Data [Dataset]. http://doi.org/10.1021/acs.jproteome.2c00513.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    ACS Publications
    Authors
    Frank Koopmans; Ka Wan Li; Remco V. Klaassen; August B. Smit
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    In the rapidly moving proteomics field, a diverse patchwork of data analysis pipelines and algorithms for data normalization and differential expression analysis is used by the community. We generated a mass spectrometry downstream analysis pipeline (MS-DAP) that integrates both popular and recently developed algorithms for normalization and statistical analyses. Additional algorithms can be easily added in the future as plugins. MS-DAP is open-source and facilitates transparent and reproducible proteome science by generating extensive data visualizations and quality reporting, provided as standardized PDF reports. Second, we performed a systematic evaluation of methods for normalization and statistical analysis on a large variety of data sets, including additional data generated in this study, which revealed key differences. Commonly used approaches for differential testing based on moderated t-statistics were consistently outperformed by more recent statistical models, all integrated in MS-DAP. Third, we introduced a novel normalization algorithm that rescues deficiencies observed in commonly used normalization methods. Finally, we used the MS-DAP platform to reanalyze a recently published large-scale proteomics data set of CSF from AD patients. This revealed increased sensitivity, resulting in additional significant target proteins which improved overlap with results reported in related studies and includes a large set of new potential AD biomarkers in addition to previously reported.

  8. U

    Research data supporting “Hybrid Sankey diagrams: visual analysis of...

    • researchdata.bath.ac.uk
    • repository.cam.ac.uk
    Updated Oct 24, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rick Lupton; Julian M. Allwood (2016). Research data supporting “Hybrid Sankey diagrams: visual analysis of multidimensional data for understanding resource use” [Dataset]. http://doi.org/10.17863/CAM.6038
    Explore at:
    Dataset updated
    Oct 24, 2016
    Dataset provided by
    University of Bath
    University of Cambridge
    Authors
    Rick Lupton; Julian M. Allwood
    Dataset funded by
    Engineering and Physical Sciences Research Council
    Description

    This data includes two example databases from the paper "Hybrid Sankey diagrams: visual analysis of multidimensional data for understanding resource use": the made-up fruit flows, and real global steel flow data from Cullen et al. (2012). It also includes the Sankey Diagram Definitions to reproduce the diagrams in the paper. The code to reproduce the figures is written in Python in the form of Jupyter notebooks. A conda environment file is included to easily set up the necessary Python packages to run the notebooks. All files are included in the "examples.zip" file. The notebook files are also uploaded standalone so they can be linked to nbviewer.

  9. Envestnet | Yodlee's De-Identified Bank Statement Data | Row/Aggregate Level...

    • datarade.ai
    .sql, .txt
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Envestnet | Yodlee, Envestnet | Yodlee's De-Identified Bank Statement Data | Row/Aggregate Level | USA Consumer Data covering 3600+ corporations | 90M+ Accounts [Dataset]. https://datarade.ai/data-products/envestnet-yodlee-s-de-identified-bank-statement-data-row-envestnet-yodlee
    Explore at:
    .sql, .txtAvailable download formats
    Dataset provided by
    Yodlee
    Envestnethttp://envestnet.com/
    Authors
    Envestnet | Yodlee
    Area covered
    United States of America
    Description

    Envestnet®| Yodlee®'s Bank Statement Data (Aggregate/Row) Panels consist of de-identified, near-real time (T+1) USA credit/debit/ACH transaction level data – offering a wide view of the consumer activity ecosystem. The underlying data is sourced from end users leveraging the aggregation portion of the Envestnet®| Yodlee®'s financial technology platform.

    Envestnet | Yodlee Consumer Panels (Aggregate/Row) include data relating to millions of transactions, including ticket size and merchant location. The dataset includes de-identified credit/debit card and bank transactions (such as a payroll deposit, account transfer, or mortgage payment). Our coverage offers insights into areas such as consumer, TMT, energy, REITs, internet, utilities, ecommerce, MBS, CMBS, equities, credit, commodities, FX, and corporate activity. We apply rigorous data science practices to deliver key KPIs daily that are focused, relevant, and ready to put into production.

    We offer free trials. Our team is available to provide support for loading, validation, sample scripts, or other services you may need to generate insights from our data.

    Investors, corporate researchers, and corporates can use our data to answer some key business questions such as: - How much are consumers spending with specific merchants/brands and how is that changing over time? - Is the share of consumer spend at a specific merchant increasing or decreasing? - How are consumers reacting to new products or services launched by merchants? - For loyal customers, how is the share of spend changing over time? - What is the company’s market share in a region for similar customers? - Is the company’s loyal user base increasing or decreasing? - Is the lifetime customer value increasing or decreasing?

    Additional Use Cases: - Use spending data to analyze sales/revenue broadly (sector-wide) or granular (company-specific). Historically, our tracked consumer spend has correlated above 85% with company-reported data from thousands of firms. Users can sort and filter by many metrics and KPIs, such as sales and transaction growth rates and online or offline transactions, as well as view customer behavior within a geographic market at a state or city level. - Reveal cohort consumer behavior to decipher long-term behavioral consumer spending shifts. Measure market share, wallet share, loyalty, consumer lifetime value, retention, demographics, and more.) - Study the effects of inflation rates via such metrics as increased total spend, ticket size, and number of transactions. - Seek out alpha-generating signals or manage your business strategically with essential, aggregated transaction and spending data analytics.

    Use Cases Categories (Our data provides an innumerable amount of use cases, and we look forward to working with new ones): 1. Market Research: Company Analysis, Company Valuation, Competitive Intelligence, Competitor Analysis, Competitor Analytics, Competitor Insights, Customer Data Enrichment, Customer Data Insights, Customer Data Intelligence, Demand Forecasting, Ecommerce Intelligence, Employee Pay Strategy, Employment Analytics, Job Income Analysis, Job Market Pricing, Marketing, Marketing Data Enrichment, Marketing Intelligence, Marketing Strategy, Payment History Analytics, Price Analysis, Pricing Analytics, Retail, Retail Analytics, Retail Intelligence, Retail POS Data Analysis, and Salary Benchmarking

    1. Investment Research: Financial Services, Hedge Funds, Investing, Mergers & Acquisitions (M&A), Stock Picking, Venture Capital (VC)

    2. Consumer Analysis: Consumer Data Enrichment, Consumer Intelligence

    3. Market Data: AnalyticsB2C Data Enrichment, Bank Data Enrichment, Behavioral Analytics, Benchmarking, Customer Insights, Customer Intelligence, Data Enhancement, Data Enrichment, Data Intelligence, Data Modeling, Ecommerce Analysis, Ecommerce Data Enrichment, Economic Analysis, Financial Data Enrichment, Financial Intelligence, Local Economic Forecasting, Location-based Analytics, Market Analysis, Market Analytics, Market Intelligence, Market Potential Analysis, Market Research, Market Share Analysis, Sales, Sales Data Enrichment, Sales Enablement, Sales Insights, Sales Intelligence, Spending Analytics, Stock Market Predictions, and Trend Analysis

  10. H

    Advancing Open and Reproducible Water Data Science by Integrating Data...

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Advancing Open and Reproducible Water Data Science by Integrating Data Analytics with an Online Data Repository [Dataset]. https://www.hydroshare.org/resource/45d3427e794543cfbee129c604d7e865
    Explore at:
    zip(50.9 MB)Available download formats
    Dataset updated
    Jan 9, 2024
    Dataset provided by
    HydroShare
    Authors
    Jeffery S. Horsburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.

    This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/

  11. International Data & Economic Analysis (IDEA)

    • catalog.data.gov
    • s.cnmilf.com
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.usaid.gov (2024). International Data & Economic Analysis (IDEA) [Dataset]. https://catalog.data.gov/dataset/international-data-economic-analysis-idea
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    United States Agency for International Developmenthttps://usaid.gov/
    Description

    International Data & Economic Analysis (IDEA) is USAID's comprehensive source of economic and social data and analysis. IDEA brings together over 12,000 data series from over 125 sources into one location for easy access by USAID and its partners through the USAID public website. The data are broken down by countries, years and the following sectors: Economy, Country Ratings and Rankings, Trade, Development Assistance, Education, Health, Population, and Natural Resources. IDEA regularly updates the database as new data become available. Examples of IDEA sources include the Demographic and Health Surveys, STATcompiler; UN Food and Agriculture Organization, Food Price Index; IMF, Direction of Trade Statistics; Millennium Challenge Corporation; and World Bank, World Development Indicators. The database can be queried by navigating to the site displayed in the Home Page field below.

  12. Dataset for Exploring case-control samples with non-targeted analysis

    • catalog.data.gov
    • datasets.ai
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Dataset for Exploring case-control samples with non-targeted analysis [Dataset]. https://catalog.data.gov/dataset/dataset-for-exploring-case-control-samples-with-non-targeted-analysis
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These data contain the results of GC-MS, LC-MS and immunochemistry analyses of mask sample extracts. The data include tentatively identified compounds through library searches and compound abundance. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The data can not be accessed. Format: The dataset contains the identification of compounds found in the mask samples as well as the abundance of those compounds for individuals who participated in the trial. This dataset is associated with the following publication: Pleil, J., M. Wallace, J. McCord, M. Madden, J. Sobus, and G. Ferguson. How do cancer-sniffing dogs sort biological samples? Exploring case-control samples with non-targeted LC-Orbitrap, GC-MS, and immunochemistry methods. Journal of Breath Research. Institute of Physics Publishing, Bristol, UK, 14(1): 016006, (2019).

  13. Example data for working with the ASpecD framework

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jul 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Till Biskup; Till Biskup (2023). Example data for working with the ASpecD framework [Dataset]. http://doi.org/10.5281/zenodo.8150115
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 16, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Till Biskup; Till Biskup
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ASpecD is a Python framework for handling spectroscopic data focussing on reproducibility. In short: Each and every processing step applied to your data will be recorded and can be traced back. Additionally, for each representation of your data (e.g., figures, tables) you can easily follow how the data shown have been processed and where they originate from.

    To provide readers of the publication describing the ASpecD framework with a concrete example of data analysis making use of recipe-driven data analysis, this repository contains both, a recipe as well as the data that are analysed, as shown in the publication describing the ASpecD framework:

    • Jara Popp, Till Biskup: ASpecD: A Modular Framework for the Analysis of Spectroscopic Data Focussing on Reproducibility and Good Scientific Practice. Chemistry--Methods 2:e202100097, 2022. doi:10.1002/cmtd.202100097
  14. z

    Missing data in the analysis of multilevel and dependent data (Examples)

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Jul 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch (2023). Missing data in the analysis of multilevel and dependent data (Examples) [Dataset]. http://doi.org/10.5281/zenodo.8168054
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 20, 2023
    Dataset provided by
    Springer
    Authors
    Simon Grund; Simon Grund; Oliver Lüdtke; Oliver Lüdtke; Alexander Robitzsch; Alexander Robitzsch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example data sets and computer code for the book chapter titled "Missing Data in the Analysis of Multilevel and Dependent Data" submitted for publication in the second edition of "Dependent Data in Social Science Research" (Stemmler et al., 2015). This repository includes the computer code (".R") and the data sets from both example analyses (Examples 1 and 2). The data sets are available in two file formats (binary ".rda" for use in R; plain-text ".dat").

    The data sets contain simulated data from 23,376 (Example 1) and 23,072 (Example 2) individuals from 2,000 groups on four variables:

    ID = group identifier (1-2000)
    x = numeric (Level 1)
    y = numeric (Level 1)
    w = binary (Level 2)

    In all data sets, missing values are coded as "NA".

  15. Z

    Example Data, Analysis and Results Files for MAUD-batch-analysis

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quinta da Fonseca, João (2023). Example Data, Analysis and Results Files for MAUD-batch-analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7602925
    Explore at:
    Dataset updated
    Sep 11, 2023
    Dataset provided by
    Quinta da Fonseca, João
    Daniel, Christopher Stuart
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Example data, example analysis and example results files for using with the MAUD-batch-analysis package. The MAUD-batch analysis package can also be downloaded from Zenodo as v1.0.0.

    To run an example analysis with MAUD batch mode, download these data, analysis and results files and unzip them into the respective folders in the MAUD-batch-analysis package.

  16. g

    SAS code used to analyze data and a datafile with metadata glossary |...

    • gimi9.com
    Updated Dec 28, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). SAS code used to analyze data and a datafile with metadata glossary | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_sas-code-used-to-analyze-data-and-a-datafile-with-metadata-glossary
    Explore at:
    Dataset updated
    Dec 28, 2016
    Description

    We compiled macroinvertebrate assemblage data collected from 1995 to 2014 from the St. Louis River Area of Concern (AOC) of western Lake Superior. Our objective was to define depth-adjusted cutoff values for benthos condition classes (poor, fair, reference) to provide tool useful for assessing progress toward achieving removal targets for the degraded benthos beneficial use impairment in the AOC. The relationship between depth and benthos metrics was wedge-shaped. We therefore used quantile regression to model the limiting effect of depth on selected benthos metrics, including taxa richness, percent non-oligochaete individuals, combined percent Ephemeroptera, Trichoptera, and Odonata individuals, and density of ephemerid mayfly nymphs (Hexagenia). We created a scaled trimetric index from the first three metrics. Metric values at or above the 90th percentile quantile regression model prediction were defined as reference condition for that depth. We set the cutoff between poor and fair condition as the 50th percentile model prediction. We examined sampler type, exposure, geographic zone of the AOC, and substrate type for confounding effects. Based on these analyses we combined data across sampler type and exposure classes and created separate models for each geographic zone. We used the resulting condition class cutoff values to assess the relative benthic condition for three habitat restoration project areas. The depth-limited pattern of ephemerid abundance we observed in the St. Louis River AOC also occurred elsewhere in the Great Lakes. We provide tabulated model predictions for application of our depth-adjusted condition class cutoff values to new sample data. This dataset is associated with the following publication: Angradi, T., W. Bartsch, A. Trebitz, V. Brady, and J. Launspach. A depth-adjusted ambient distribution approach for setting numeric removal targets for a Great Lakes Area of Concern beneficial use impairment: Degraded benthos. JOURNAL OF GREAT LAKES RESEARCH. International Association for Great Lakes Research, Ann Arbor, MI, USA, 43(1): 108-120, (2017).

  17. d

    Data from the Chemical Analysis of Archived Stream-Sediment Samples, Alaska

    • catalog.data.gov
    • data.usgs.gov
    • +3more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Data from the Chemical Analysis of Archived Stream-Sediment Samples, Alaska [Dataset]. https://catalog.data.gov/dataset/data-from-the-chemical-analysis-of-archived-stream-sediment-samples-alaska
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Alaska
    Description

    This data release contains the elemental concentration data for more than 1700 archived stream-sediment samples collected in Alaska. Samples were retrieved from the USGS Mineral Program's sample archive in Denver, CO, and the Alaska Division of Geological and Geophysical Surveys Geologic Materials Center in Anchorage, AK. All samples were analyzed using a multi-element analytical method involving fusion of the sample by sodium peroxide, dissolution of the fusion cake by nitric acid, and elemental analysis by inductively coupled plasma-optical emission spectroscopy (ICP-OES) and inductively coupled plasma-mass spectroscopy (ICP-MS). Additionally, 106 samples from the Nixon Fork area were analyzed by a second multi-element method in which the samples are decomposed by a mixture of hydrochloric, nitric, perchloric, and hydrofluoric acids and the elemental composition is determined by ICP-OES and ICP-MS. New Hg (mercury) concentrations, determined by cold-vapor atomic absorption spectrometry, are reported for 296 samples from southeast Alaska.

  18. f

    DataSheet_1_The TargetMine Data Warehouse: Enhancement and Updates.pdf

    • frontiersin.figshare.com
    pdf
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yi-An Chen; Lokesh P. Tripathi; Takeshi Fujiwara; Tatsuya Kameyama; Mari N. Itoh; Kenji Mizuguchi (2023). DataSheet_1_The TargetMine Data Warehouse: Enhancement and Updates.pdf [Dataset]. http://doi.org/10.3389/fgene.2019.00934.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Yi-An Chen; Lokesh P. Tripathi; Takeshi Fujiwara; Tatsuya Kameyama; Mari N. Itoh; Kenji Mizuguchi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Biological data analysis is the key to new discoveries in disease biology and drug discovery. The rapid proliferation of high-throughput ‘omics’ data has necessitated a need for tools and platforms that allow the researchers to combine and analyse different types of biological data and obtain biologically relevant knowledge. We had previously developed TargetMine, an integrative data analysis platform for target prioritisation and broad-based biological knowledge discovery. Here, we describe the newly modelled biological data types and the enhanced visual and analytical features of TargetMine. These enhancements have included: an enhanced coverage of gene–gene relations, small molecule metabolite to pathway mappings, an improved literature survey feature, and in silico prediction of gene functional associations such as protein–protein interactions and global gene co-expression. We have also described two usage examples on trans-omics data analysis and extraction of gene-disease associations using MeSH term descriptors. These examples have demonstrated how the newer enhancements in TargetMine have contributed to a more expansive coverage of the biological data space and can help interpret genotype–phenotype relations. TargetMine with its auxiliary toolkit is available at https://targetmine.mizuguchilab.org. The TargetMine source code is available at https://github.com/chenyian-nibio/targetmine-gradle.

  19. Clinical Next-Generation Sequencing (NGS) Data Analysis Global Market Report...

    • thebusinessresearchcompany.com
    pdf,excel,csv,ppt
    Updated Jan 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Business Research Company (2025). Clinical Next-Generation Sequencing (NGS) Data Analysis Global Market Report 2025 [Dataset]. https://www.thebusinessresearchcompany.com/report/clinical-next-generation-sequencing-ngs-data-analysis-global-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jan 9, 2025
    Dataset authored and provided by
    The Business Research Company
    License

    https://www.thebusinessresearchcompany.com/privacy-policyhttps://www.thebusinessresearchcompany.com/privacy-policy

    Description

    The Clinical Next-Generation Sequencing (NGS) Data Analysis Market is projected to grow at 20.5% CAGR, reaching $8.73 Billion by 2029. Where is the industry heading next? Get the sample report now!

  20. m

    Data from: Probability waves: adaptive cluster-based correction by...

    • data.mendeley.com
    • narcis.nl
    Updated Feb 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DIMITRI ABRAMOV (2021). Probability waves: adaptive cluster-based correction by convolution of p-value series from mass univariate analysis [Dataset]. http://doi.org/10.17632/rrm4rkr3xn.1
    Explore at:
    Dataset updated
    Feb 8, 2021
    Authors
    DIMITRI ABRAMOV
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    dataset and Octave/MatLab codes/scripts for data analysis Background: Methods for p-value correction are criticized for either increasing Type II error or improperly reducing Type I error. This problem is worse when dealing with thousands or even hundreds of paired comparisons between waves or images which are performed point-to-point. This text considers patterns in probability vectors resulting from multiple point-to-point comparisons between two event-related potentials (ERP) waves (mass univariate analysis) to correct p-values, where clusters of signiticant p-values may indicate true H0 rejection. New method: We used ERP data from normal subjects and other ones with attention deficit hyperactivity disorder (ADHD) under a cued forced two-choice test to study attention. The decimal logarithm of the p-vector (p') was convolved with a Gaussian window whose length was set as the shortest lag above which autocorrelation of each ERP wave may be assumed to have vanished. To verify the reliability of the present correction method, we realized Monte-Carlo simulations (MC) to (1) evaluate confidence intervals of rejected and non-rejected areas of our data, (2) to evaluate differences between corrected and uncorrected p-vectors or simulated ones in terms of distribution of significant p-values, and (3) to empirically verify rate of type-I error (comparing 10,000 pairs of mixed samples whit control and ADHD subjects). Results: the present method reduced the range of p'-values that did not show covariance with neighbors (type I and also type-II errors). The differences between simulation or raw p-vector and corrected p-vectors were, respectively, minimal and maximal for window length set by autocorrelation in p-vector convolution. Comparison with existing methods: Our method was less conservative while FDR methods rejected basically all significant p-values for Pz and O2 channels. The MC simulations, gold-standard method for error correction, presented 2.78±4.83% of difference (all 20 channels) from p-vector after correction, while difference between raw and corrected p-vector was 5,96±5.00% (p = 0.0003). Conclusion: As a cluster-based correction, the present new method seems to be biological and statistically suitable to correct p-values in mass univariate analysis of ERP waves, which adopts adaptive parameters to set correction.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. Department of State (2021). PISA Data Analysis Manual: SPSS, Second Edition [Dataset]. https://catalog.data.gov/dataset/pisa-data-analysis-manual-spss-second-edition

Data from: PISA Data Analysis Manual: SPSS, Second Edition

Related Article
Explore at:
Dataset updated
Mar 30, 2021
Dataset provided by
U.S. Department of State
Description

The OECD Programme for International Student Assessment (PISA) surveys collected data on students’ performances in reading, mathematics and science, as well as contextual information on students’ background, home characteristics and school factors which could influence performance. This publication includes detailed information on how to analyse the PISA data, enabling researchers to both reproduce the initial results and to undertake further analyses. In addition to the inclusion of the necessary techniques, the manual also includes a detailed account of the PISA 2006 database and worked examples providing full syntax in SPSS.

Search
Clear search
Close search
Google apps
Main menu