22 datasets found
  1. q

    REMNet Tutorial, R Part 5: Normalizing Microbiome Data in R 5.2.19

    • qubeshub.org
    Updated Aug 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jessica Joyner (2019). REMNet Tutorial, R Part 5: Normalizing Microbiome Data in R 5.2.19 [Dataset]. http://doi.org/10.25334/M13H-XT81
    Explore at:
    Dataset updated
    Aug 28, 2019
    Dataset provided by
    QUBES
    Authors
    Jessica Joyner
    Description

    Video on normalizing microbiome data from the Research Experiences in Microbiomes Network

  2. The global spectrum of plant form and function dataset: taxonomic...

    • zenodo.org
    Updated May 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roeland Kindt; Roeland Kindt (2025). The global spectrum of plant form and function dataset: taxonomic standardization of 45,955 taxa to World Flora Online version 2023.12 [Dataset]. http://doi.org/10.5281/zenodo.15563432
    Explore at:
    Dataset updated
    May 31, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Roeland Kindt; Roeland Kindt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The global spectrum of plant form and function dataset (Diaz et al. 2022; Diaz et al. 2016; TRY 2022, accessed 15-May-2025) provides mean trait values for (i) plant height; (ii) stem specific density; (iii) leaf area; (iv) leaf mass per area; (v) leaf nitrogen content per dry mass; and (vi) diaspore (seed or spore) mass for 46,047 taxa.

    Here I provide a dataset where the taxa covered by that database were standardized to World Flora Online (Borsch et al. 2020; taxonomic backbone version 2023.12) by matching names with those in the Agroforestry Species Switchboard (Kindt et al. 2025; version 4). Taxa for which no matches could be found were standardized with the WorldFlora package (Kindt 2020), using similar R scripts and the same taxonomic backbone data as those used to standardize species names for the Switchboard. Where still no matches could be found, taxa were matched with those matched previously with a harmonized data set for TRY 6.0 (Kindt 2024).

    References

    • Díaz, S., Kattge, J., Cornelissen, J.H.C. et al. The global spectrum of plant form and function: enhanced species-level trait dataset. Sci Data 9, 755 (2022). https://doi.org/10.1038/s41597-022-01774-9
    • Díaz, S., Kattge, J., Cornelissen, J. et al. The global spectrum of plant form and function. Nature 529, 167–171 (2016). https://doi.org/10.1038
    • TRY. 2022. The global spectrum of plant form and function dataset. https://www.try-db.org/TryWeb/Data.php#81
    • Borsch, T., Berendsohn, W., Dalcin, E., Delmas, M., Demissew, S., Elliott, A., Fritsch, P., Fuchs, A., Geltman, D., Güner, A., Haevermans, T., Knapp, S., le Roux, M.M., Loizeau, P.-A., Miller, C., Miller, J., Miller, J.T., Palese, R., Paton, A., Parnell, J., Pendry, C., Qin, H.-N., Sosa, V., Sosef, M., von Raab-Straube, E., Ranwashe, F., Raz, L., Salimov, R., Smets, E., Thiers, B., Thomas, W., Tulig, M., Ulate, W., Ung, V., Watson, M., Jackson, P.W. and Zamora, N. (2020), World Flora Online: Placing taxonomists at the heart of a definitive and comprehensive global resource on the world's plants. TAXON, 69: 1311-1341. https://doi.org/10.1002/tax.12373
    • Roeland Kindt, Ilyas Siddique, Ian Dawson, Innocent John, Fabio Pedercini, Jens-Peter B. Lillesø, Lars Graudal. 2025. The Agroforestry Species Switchboard, a global resource to explore information for 107,269 plant species. bioRxiv 2025.03.09.642182; doi: https://doi.org/10.1101/2025.03.09.642182
    • Kindt, R. 2020. WorldFlora: An R package for exact and fuzzy matching of plant names against the World Flora Online taxonomic backbone data. Applications in Plant Sciences 8(9): e11388. https://doi.org/10.1002/aps3.11388
    • Kindt, R. (2024). TRY 6.0 - Species List from Taxonomic Harmonization – Matches with World Flora Online version 2023.12 (2024.10b) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.13906338

    Funding

    The development of this dataset was supported by the German International Climate Initiative (IKI) to the regional tree seed programme on The Right Tree for the Right Place for the Right Purpose in Africa, by Norway’s International Climate and Forest Initiative through the Royal Norwegian Embassy in Ethiopia to the Provision of Adequate Tree Seed Portfolio project in Ethiopia, and by the Bezos Earth Fund to the Quality Tree Seed for Africa in Kenya and Rwanda project.

  3. Meta data and supporting documentation

    • catalog.data.gov
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  4. Z

    Example subjects for Mobilise-D data standardization

    • data.niaid.nih.gov
    Updated Oct 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palmerini, Luca; Reggi, Luca; Bonci, Tecla; Del Din, Silvia; Micó-Amigo, Encarna; Salis, Francesca; Bertuletti, Stefano; Caruso, Marco; Cereatti, Andrea; Gazit, Eran; Paraschiv-Ionescu, Anisoara; Soltani, Abolfazl; Kluge, Felix; Küderle, Arne; Ullrich, Martin; Kirk, Cameron; Hiden, Hugo; D'Ascanio, Ilaria; Hansen, Clint; Rochester, Lynn; Mazzà, Claudia; Chiari, Lorenzo; on behalf of the Mobilise-D consortium (2022). Example subjects for Mobilise-D data standardization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7185428
    Explore at:
    Dataset updated
    Oct 11, 2022
    Dataset provided by
    Newcastle University, Translational and Clinical Research Institute, Faculty of Medical Sciences, UK.
    University of Sassari, Department of Biomedical Sciences, Italy.
    University of Bologna, Department of Electrical, Electronic and Information Engineering 'Guglielmo Marconi', Italy. University of Bologna, Health Sciences and Technologies—Interdepartmental Center for Industrial Research (CIRI-SDV), Italy
    Newcastle University, School of Computing, UK.
    Neurogeriatrics Kiel, Department of Neurology, University Hospital Schleswig-Holstein, Germany.
    Laboratory of Movement Analysis and Measurement, Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland.
    Politecnico di Torino, Department of Electronics and Telecommunications, Italy. Politecnico di Torino, PolitoBIOMed Lab – Biomedical Engineering Lab, Italy.
    Newcastle University, Translational and Clinical Research Institute, Faculty of Medical Sciences, UK. The Newcastle upon Tyne NHS Foundation Trust, UK.
    Politecnico di Torino, Department of Electronics and Telecommunications, Italy.
    The University of Sheffield, INSIGNEO Institute for in silico Medicine, UK. The University of Sheffield, Department of Mechanical Engineering, UK
    https://www.mobilise-d.eu/partners
    Tel Aviv Sourasky Medical Center, Center for the Study of Movement, Cognition and Mobility, Neurological Institute, Israel.
    Machine Learning and Data Analytics Lab, Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, Germany.
    University of Bologna, Department of Electrical, Electronic and Information Engineering 'Guglielmo Marconi', Italy.
    University of Bologna, Health Sciences and Technologies—Interdepartmental Center for Industrial Research (CIRI-SDV), Italy
    Authors
    Palmerini, Luca; Reggi, Luca; Bonci, Tecla; Del Din, Silvia; Micó-Amigo, Encarna; Salis, Francesca; Bertuletti, Stefano; Caruso, Marco; Cereatti, Andrea; Gazit, Eran; Paraschiv-Ionescu, Anisoara; Soltani, Abolfazl; Kluge, Felix; Küderle, Arne; Ullrich, Martin; Kirk, Cameron; Hiden, Hugo; D'Ascanio, Ilaria; Hansen, Clint; Rochester, Lynn; Mazzà, Claudia; Chiari, Lorenzo; on behalf of the Mobilise-D consortium
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Standardized data from Mobilise-D participants (YAR dataset) and pre-existing datasets (ICICLE, MSIPC2, Gait in Lab and real-life settings, MS project, UNISS-UNIGE) are provided in the shared folder, as an example of the procedures proposed in the publication "Mobility recorded by wearable devices and gold standards: the Mobilise-D procedure for data standardization" that is currently under review in Scientific data. Please refer to that publication for further information. Please cite that publication if using these data.

    The code to standardize an example subject (for the ICICLE dataset) and to open the standardized Matlab files in other languages (Python, R) is available in github (https://github.com/luca-palmerini/Procedure-wearable-data-standardization-Mobilise-D).

  5. Dataset supporting: Normalizing and denoising protein expression data from...

    • nih.figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew P. Mulé; Andrew J. Martins; John Tsang (2023). Dataset supporting: Normalizing and denoising protein expression data from droplet-based single cell profiling [Dataset]. http://doi.org/10.35092/yhjc.13370915.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Matthew P. Mulé; Andrew J. Martins; John Tsang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data for reproducing analysis in the manuscript:Normalizing and denoising protein expression data from droplet-based single cell profilinglink to manuscript: https://www.biorxiv.org/content/10.1101/2020.02.24.963603v1

    Data deposited here are for the purposes of reproducing the analysis results and figures reported in the manuscript above. These data are all publicly available downloaded and converted to R datasets prior to Dec 4, 2020. For a full description of all the data included in this repository and instructions for reproducing all analysis results and figures, please see the repository: https://github.com/niaid/dsb_manuscript.

    For usage of the dsb R package for normalizing CITE-seq data please see the repository: https://github.com/niaid/dsb

    If you use the dsb R package in your work please cite:Mulè MP, Martins AJ, Tsang JS. Normalizing and denoising protein expression data from droplet-based single cell profiling. bioRxiv. 2020;2020.02.24.963603.

    General contact: John Tsang (john.tsang AT nih.gov)

    Questions about software/code: Matt Mulè (mulemp AT nih.gov)

  6. Data from: Size normalizing planktonic Foraminifera abundance in the water...

    • zenodo.org
    bin
    Updated Aug 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sonia Chaabane; Sonia Chaabane; Thibault de Garidel-Thoron; Thibault de Garidel-Thoron; Xavier Giraud; Xavier Giraud; Julie Meilland; Julie Meilland; Geert-Jan A. Brummer; Geert-Jan A. Brummer; Lukas Jonkers; P. Graham Mortyn; Mattia Greco; Nicolas Casajus; Michal Kucera; Olivier Sulpis; Olivier Sulpis; Azumi Kuroyanagi; Hélène Howa; Gregory Beaugrand; Gregory Beaugrand; Ralf Schiebel; Ralf Schiebel; Lukas Jonkers; P. Graham Mortyn; Mattia Greco; Nicolas Casajus; Michal Kucera; Azumi Kuroyanagi; Hélène Howa (2024). Size normalizing planktonic Foraminifera abundance in the water column [Dataset]. http://doi.org/10.5281/zenodo.10750545
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sonia Chaabane; Sonia Chaabane; Thibault de Garidel-Thoron; Thibault de Garidel-Thoron; Xavier Giraud; Xavier Giraud; Julie Meilland; Julie Meilland; Geert-Jan A. Brummer; Geert-Jan A. Brummer; Lukas Jonkers; P. Graham Mortyn; Mattia Greco; Nicolas Casajus; Michal Kucera; Olivier Sulpis; Olivier Sulpis; Azumi Kuroyanagi; Hélène Howa; Gregory Beaugrand; Gregory Beaugrand; Ralf Schiebel; Ralf Schiebel; Lukas Jonkers; P. Graham Mortyn; Mattia Greco; Nicolas Casajus; Michal Kucera; Azumi Kuroyanagi; Hélène Howa
    Time period covered
    Mar 2, 2024
    Description

    Data and R code for the paper Size normalizing planktonic Foraminifera abundance in the water column (https://doi.org/10.1002/lom3.10637) by Sonia Chaabane, Thibault de Garidel-Thoron, Xavier Giraud, Julie Meilland, Geert-Jan A. Brummer, Lukas Jonkers, P. Graham Mortyn, Mattia Greco, Nicolas Casajus, Olivier Sulpis, Michal Kucera, Azumi Kuroyanagi, Hélène Howa, Gregory Beaugrand, Ralf Schiebel

    The codes serve to generate a new normalization approach for estimating the abundance of planktonic Foraminifera (ind/m³) within the specified collection size fraction range. Data utilized in this study are sourced from the FORCIS database, containing records collected from the global ocean at various depths spanning the past century. A cumulative distribution across size fractions is identified and modeled using a Michaelis-Menten function. This modeling results in multiplication factors enabling the normalization of one fraction to any other size fraction equal to or larger than 100 µm. The resultant size normalization model is then tested across various depths and compared against a previous size normalization solution.

    Scripts written by Sonia Chaabane.

    DATA SOURCES

    DATA

    1. data_raw_from_excel.RDS

    CODES

    1. Code 1_Prepare the data.R: Reads the data and prepares it for analysis.
    2. Code 2_Data-model_training.R: Analyzes the data and builds the model.
    3. Code 2_MM_confidence interval_all oceans_depths_seasons.R: Analyzes the data and computes confidence intervals across all oceans, depths, and seasons.
    4. Code 3_Validation.R: Compares actual vs. estimated number concentrations.
    5. Code 4_Test with berger scheme.R: Compares actual vs. estimated number concentrations using Berger 1969 correction scheme.
    6. Code 5_Cross validation_Retailleau et al.R: Applies the FORCIS number concentration-size correction scheme on an independent dataset.
    7. Code 6_Retailleau et al. using berger approach.R: Compares actual vs. estimated number concentrations using Berger 1969 correction scheme from an independent dataset.
    8. function.R: Additional functions used in the analysis.

  7. Simulation Data Set

    • catalog.data.gov
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; “Simulated_Dataset.RData”. Metadata (including data dictionary) • y: Vector of binary responses (1: adverse outcome, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (“CWVS_LMC.txt”) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (“Results_Summary.txt”) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description “CWVS_LMC.txt”: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the “Simulated_Dataset.RData” workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. “Results_Summary.txt”: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the “CWVS_LMC.txt” code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: • For running “CWVS_LMC.txt”: • msm: Sampling from the truncated normal distribution • mnormt: Sampling from the multivariate normal distribution • BayesLogit: Sampling from the Polya-Gamma distribution • For running “Results_Summary.txt”: • plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: • Load the “Simulated_Dataset.RData” workspace • Run the code contained in “CWVS_LMC.txt” • Once the “CWVS_LMC.txt” code is complete, run “Results_Summary.txt”. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  8. m

    Data and code for: Contagion risk prediction with Chart Graph Convolutional...

    • data.mendeley.com
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhensong Chen (2025). Data and code for: Contagion risk prediction with Chart Graph Convolutional Network: Evidence from Chinese stock market [Dataset]. http://doi.org/10.17632/6xy9d4bp28.1
    Explore at:
    Dataset updated
    Jun 5, 2025
    Authors
    Zhensong Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset accompanies the study “Contagion risk prediction with Chart Graph Convolutional Network: Evidence from Chinese stock market”, which proposes a framework for contagion risk prediction by comprehensively mining the features of technical charts and technical indicators. The utilized data include the closing prices of 28 sectors in Shen wan primary industry index, the closing price of CSI-300 Index, and eight classes of trading indicators that include Turnover Rate, Price-to-Earnings Ratio, Trading Volume, Relative Strength Index, Moving Average Convergence Divergence, Moving Average, Bollinger Bands, and Stochastic Oscillator. The sample period is from 5 Jan 2007 to 30 Dec 2022. The closing prices of 28 sectors are downloaded from the Choice database. The closing price of the CSI-300 Index and eight classes of trading indicators are downloaded from the Wind database. This dataset includes two raw data files, one predefined temporary file, and eighteen code files, which are described as follows: Sector_data.csv stores the closing prices of 28 sectors. CSI_300_data.csv includes closing price of CSI-300 Index, and eight classes of trading indicators. DCC_temp.csv is a predefined temporary file used to store correlation results. Descriptive_code.py is utilized to calculate the statistical results. ADF Test.py is utilized to test the stationarity of the data. Min-max normalization.py is utilized to standardize data. ADCC-GJR-GARCH.R is utilized to calculate dynamic conditional correlations between sectors. MST_figure.py is used to a construct complex network that illustrates the inter-sector relationships. Correlation.py is used to calculate inter-industry correlations. Corr_up.py, corr_mid.py and corr_down.py are used to calculate dynamic correlations in upstream, midstream, and downstream sectors. Centrality.py is used to quantify the importance or influence of nodes within a network, particularly across distinct upstream, midstream, and downstream sectors. Averaging_corr_over_a_5-day_period.py calculates 5-day rolling averages of correlation and centrality metrics to quantify contagion risk on a weekly cycle. Convert technical charts using PIP and VG methods.py extracts significant nodes and converts them into graphical representations, and save them in Daily Importance Score.csv, Daily Threshold Matrix.csv, and Daily Technical Indicators.csv. Convert_CSV_to_TXT.py converts Daily Importance Score.csv, Daily Threshold Matrix.csv, and Daily Technical Indicators.csv into TXT files for later use. Four files included in the folder of Generating and normalizing the subgraphs to generate subgraphs and then normalize them. The receptive_field.py serves as the main program, which calls the other three files. The stock_graph_indicator.py calculates topological structure data for subsequent use. Predictive_model.py takes normalized subgraphs and Y-values defined by contagion risk as inputs and performs parameter tuning to achieve optimal results.

  9. b

    Methods for normalizing microbiome data: an ecological perspective

    • nde-dev.biothings.io
    zip
    Updated Oct 30, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger (2018). Methods for normalizing microbiome data: an ecological perspective [Dataset]. http://doi.org/10.5061/dryad.tn8qs35
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 30, 2018
    Dataset provided by
    James Cook University
    University of New England
    Authors
    Donald T. McKnight; Roger Huerlimann; Deborah S. Bower; Lin Schwarzkopf; Ross A. Alford; Kyall R. Zenger
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description
    1. Microbiome sequencing data often need to be normalized due to differences in read depths, and recommendations for microbiome analyses generally warn against using proportions or rarefying to normalize data and instead advocate alternatives, such as upper quartile, CSS, edgeR-TMM, or DESeq-VS. Those recommendations are, however, based on studies that focused on differential abundance testing and variance standardization, rather than community-level comparisons (i.e., beta diversity), Also, standardizing the within-sample variance across samples may suppress differences in species evenness, potentially distorting community-level patterns. Furthermore, the recommended methods use log transformations, which we expect to exaggerate the importance of differences among rare OTUs, while suppressing the importance of differences among common OTUs. 2. We tested these theoretical predictions via simulations and a real-world data set. 3. Proportions and rarefying produced more accurate comparisons among communities and were the only methods that fully normalized read depths across samples. Additionally, upper quartile, CSS, edgeR-TMM, and DESeq-VS often masked differences among communities when common OTUs differed, and they produced false positives when rare OTUs differed. 4. Based on our simulations, normalizing via proportions may be superior to other commonly used methods for comparing ecological communities.
  10. e

    Standardized NEON organismal data (neonDivData)

    • portal.edirepository.org
    bin, csv
    Updated Apr 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daijiang Li; Sydne Record; Eric Sokol; Matthew Bitters; Melissa Chen; Anny Chung; Matthew Helmus; Ruvi Jaimes; Lara Jansen; Marta Jarzyna; Michael Just; Jalene LaMontagne; Brett Melbourne; Wynne Moss; Kari Norman; Stephanie Parker; Natalie Robinson; Bijan Seyednasrollah; Sarah Spaulding; Thilina Surasinghe; Sarah Thomsen; Phoebe Zarnetske (2022). Standardized NEON organismal data (neonDivData) [Dataset]. http://doi.org/10.6073/pasta/c28dd4f6e7989003505ea02e9a92afbf
    Explore at:
    csv(67793652 bytes), csv(266884330 bytes), csv(4643854 bytes), csv(12011 bytes), csv(944312 bytes), csv(6879 bytes), csv(25181268 bytes), csv(1949590 bytes), csv(375200 bytes), csv(3062147 bytes), csv(35160044 bytes), csv(738408 bytes), csv(18427828 bytes), csv(604110 bytes), csv(35684117 bytes), csv(86101256 bytes), bin(20729 bytes), bin(4674 bytes)Available download formats
    Dataset updated
    Apr 12, 2022
    Dataset provided by
    EDI
    Authors
    Daijiang Li; Sydne Record; Eric Sokol; Matthew Bitters; Melissa Chen; Anny Chung; Matthew Helmus; Ruvi Jaimes; Lara Jansen; Marta Jarzyna; Michael Just; Jalene LaMontagne; Brett Melbourne; Wynne Moss; Kari Norman; Stephanie Parker; Natalie Robinson; Bijan Seyednasrollah; Sarah Spaulding; Thilina Surasinghe; Sarah Thomsen; Phoebe Zarnetske
    Time period covered
    Jun 5, 2013 - Jul 28, 2020
    Area covered
    Variables measured
    sex, unit, year, State, endRH, month, sites, units, value, boutID, and 113 more
    Description

    To standardize NEON organismal data for major taxonomic groups, we first systematically reviewed NEON’s documentations for each taxonomic group. We then discussed as a group and with NEON staff to decide how to wrangle and standardize NEON organismal data. See Li et al. 2022 for more details. All R code to process NEON data products can be obtained through the R package ‘ecocomDP’. Once the data are in ecocomDP format, we further processed them to convert them into long data frames with code on Github (https://github.com/daijiang/neonDivData/tree/master/data-raw), which is also archived here.

  11. Laptop_sales_dataset_cleaned_and_raw_dataset

    • kaggle.com
    zip
    Updated Dec 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravi Kumar (2025). Laptop_sales_dataset_cleaned_and_raw_dataset [Dataset]. https://www.kaggle.com/datasets/mytalkwithyou/laptop-sales-dataset-cleaned-and-raw-dataset
    Explore at:
    zip(1215475 bytes)Available download formats
    Dataset updated
    Dec 23, 2025
    Authors
    Ravi Kumar
    Description

    📊 Cleaned Laptop Sales Dataset | MySQL Data Cleaning & Analysis Ready

    This dataset contains cleaned and structured laptop sales data, prepared using MySQL for easy analysis, visualization, and machine learning practice. It is ideal for data analysis projects, SQL practice, dashboards, and portfolio work.

    The raw data was carefully processed to remove inconsistencies, handle missing values, standardize formats, and improve overall data quality. The final dataset is analysis-ready and suitable for use in tools such as MySQL, Power BI, Tableau, Excel, Python, and R.

    Before cleaned dataset has 1303 rows and 11 columns After cleaned dataset has 1303 rows and 18 columns

    Use Cases: This dataset can be used for: SQL practice (SELECT, JOIN, GROUP BY, subqueries, etc.) Sales and pricing analysis Market trend analysis Dashboard creation (Power BI / Tableau) Data cleaning & preprocessing practice Beginner to intermediate data analytics projects Portfolio and interview preparation

    🔧 Data Cleaning Process (Performed in MySQL) Removed duplicate records Handled missing and null values Standardized column names and data types Corrected inconsistent categorical values Ensured numeric fields are clean and usable Optimized structure for querying and analysis

    📁 Dataset Contents The dataset typically includes information such as: Laptop brand and model Specifications (RAM, storage, processor, etc.) Pricing details Sales or availability information Other relevant attributes useful for analysis

    👨‍💻 Who Is This Dataset For? Data analysts & business analysts Students learning SQL and data analysis Beginners building projects Kaggle learners & competitors Anyone practicing real-world data cleaning

    📝 Notes The dataset is cleaned but not artificially modified Suitable for both educational and practical use Feel free to explore, visualize, and build models

  12. SilvaGRIS Global Information System on Forest Genetic Resources: taxonomic...

    • zenodo.org
    Updated Apr 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roeland Kindt; Roeland Kindt (2025). SilvaGRIS Global Information System on Forest Genetic Resources: taxonomic standardization of 2,794 species to World Flora Online or the World Checklist of Vascular Plants [Dataset]. http://doi.org/10.5281/zenodo.15130265
    Explore at:
    Dataset updated
    Apr 3, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Roeland Kindt; Roeland Kindt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SilvaGRIS (https://fgr.apps.fao.org/en; accessed 28th March 2025) is a new global information system on forest genetic resources that makes available the data countries report to FAO for monitoring the implementation of the Global Plan of Action for the Conservation, Sustainable Use and Development of Forest Genetic Resources and for preparing global assessments on these resources. The first dataset was provided by 77 countries, which reported more than 2,800 species for the preparation of The Second Report on the State of the World’s Forest Genetic Resources.

    Here I provide a dataset where these species were standardized to World Flora Online (Borsch et al. 2020; taxonomic backbone version 2023.12) by matching names with those in the Agroforestry Species Switchboard (Kindt et al. 2025; version 4). Species for which no matches could be found were standardized with the WorldFlora package (Kindt 2020), using similar R scripts and the same taxonomic backbone data as those used to standardize species names for the Switchboard. Twelve hybrid species such as Acacia mangium x auriculiformis could not be matched with taxa in the taxonomic backbone databases.

    Additional fields indicate whether a species can be classified as a tree (2509 species), shrub (156), tree or shrub (6), tree-like palm (57), bamboo (16), rattan (6), or other categories that included rosette trees (16) and tree ferns (3). The category was inferred from the species being flagged as a tree in the Switchboard, from the lifeform obtained from the World Checklist of Vascular Plants (WCVP; Govaerts et al. 2021, version 11), or from other specified information sources in case there were insufficient details available from the Switchboard or the WCVP.

    There are also additional fields to indicate that the species was included in the Tree Globally Observed Environmental Ranges database (TreeGOER; Kindt 2023) or in the TreeGOER+ database (Kindt 2024; this databases includes bamboo, hybrid and other woody tree species not included in TreeGOER).

    References

    • FAO. 2025. SilvaGRIS - Global Information System on Forest Genetic Resources. Food and Agriculture Organization of the United Nations. URL https://fgr.apps.fao.org/en accessed 28-MAR-2025
    • Borsch, T., Berendsohn, W., Dalcin, E., Delmas, M., Demissew, S., Elliott, A., Fritsch, P., Fuchs, A., Geltman, D., Güner, A., Haevermans, T., Knapp, S., le Roux, M.M., Loizeau, P.-A., Miller, C., Miller, J., Miller, J.T., Palese, R., Paton, A., Parnell, J., Pendry, C., Qin, H.-N., Sosa, V., Sosef, M., von Raab-Straube, E., Ranwashe, F., Raz, L., Salimov, R., Smets, E., Thiers, B., Thomas, W., Tulig, M., Ulate, W., Ung, V., Watson, M., Jackson, P.W. and Zamora, N. (2020), World Flora Online: Placing taxonomists at the heart of a definitive and comprehensive global resource on the world's plants. TAXON, 69: 1311-1341. https://doi.org/10.1002/tax.12373
    • Roeland Kindt, Ilyas Siddique, Ian Dawson, Innocent John, Fabio Pedercini, Jens-Peter B. Lilleso, Lars Graudal. 2025. The Agroforestry Species Switchboard, a global resource to explore information for 107,269 plant species. bioRxiv 2025.03.09.642182; doi: https://doi.org/10.1101/2025.03.09.642182
    • Govaerts, R., Nic Lughadha, E., Black, N. et al. The World Checklist of Vascular Plants, a continuously updated resource for exploring global plant diversity. Sci Data 8, 215 (2021). https://doi.org/10.1038/s41597-021-00997-6
    • Kindt, R. 2020. WorldFlora: An R package for exact and fuzzy matching of plant names against the World Flora Online taxonomic backbone data. Applications in Plant Sciences 8(9): e11388. https://doi.org/10.1002/aps3.11388
    • Kindt, R. (2023). TreeGOER: A database with globally observed environmental ranges for 48,129 tree species. Global Change Biology, 29, 6303–6318. https://doi.org/10.1111/gcb.16914
    • Kindt, R. (2024). TreeGOER 2024 Expansion: Expansion with additional tree and bamboo species identified via the World Checklist of Vascular Plants (2024.06) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11652972

    Funding

    The development of this dataset was supported by the German International Climate Initiative (IKI) to the regional tree seed programme on The Right Tree for the Right Place for the Right Purpose in Africa, by Norway’s International Climate and Forest Initiative through the Royal Norwegian Embassy in Ethiopia to the Provision of Adequate Tree Seed Portfolio project in Ethiopia, and by the Bezos Earth Fund to the Quality Tree Seed for Africa in Kenya and Rwanda project.

  13. GSE6740 Data Normalization SubtypeAnalysis Patient

    • kaggle.com
    zip
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr. Nagendra (2025). GSE6740 Data Normalization SubtypeAnalysis Patient [Dataset]. https://www.kaggle.com/datasets/mannekuntanagendra/gse6740-data-normalization-subtypeanalysis-patient
    Explore at:
    zip(1838637 bytes)Available download formats
    Dataset updated
    Nov 29, 2025
    Authors
    Dr. Nagendra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    • This dataset contains processed gene expression data derived from the publicly available GEO series GSE6740. • The dataset focuses on the normalization, preprocessing, and subtype-level analysis of patient samples. • It includes R scripts and resources used to clean, transform, and standardize raw microarray expression values. • The uploaded files support the step-by-step workflow used to perform differential expression and subtype clustering. • The dataset is suitable for users working on microarray analysis, normalization pipelines, and cancer or immune cell subtype research. • All preprocessing steps follow standard bioinformatics workflows, including background correction, log transformation, and quantile normalization. • The dataset allows users to reproduce normalization results, explore subtype-level grouping, and run downstream statistical comparisons. • It includes annotated patient group information and cell-type–specific analytical procedures used in GSE6740-based research. • The content is designed for students, bioinformaticians, and researchers learning microarray data normalization with R. • The dataset can be directly used for training, teaching, method comparison, or as a reference workflow for microarray processing.

  14. Additional file 3: of DBNorm: normalizing high-density oligonucleotide...

    • springernature.figshare.com
    txt
    Updated Nov 30, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qinxue Meng; Daniel Catchpoole; David Skillicorn; Paul Kennedy (2017). Additional file 3: of DBNorm: normalizing high-density oligonucleotide microarray data based on distributions [Dataset]. http://doi.org/10.6084/m9.figshare.5648932.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 30, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Qinxue Meng; Daniel Catchpoole; David Skillicorn; Paul Kennedy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DBNorm test script. Code of how we test DBNorm package. (TXT 2Â kb)

  15. Naturalistic Neuroimaging Database

    • openneuro.org
    Updated Apr 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper (2021). Naturalistic Neuroimaging Database [Dataset]. http://doi.org/10.18112/openneuro.ds002837.v1.1.3
    Explore at:
    Dataset updated
    Apr 20, 2021
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Overview

    • The Naturalistic Neuroimaging Database (NNDb v2.0) contains datasets from 86 human participants doing the NIH Toolbox and then watching one of 10 full-length movies during functional magnetic resonance imaging (fMRI).The participants were all right-handed, native English speakers, with no history of neurological/psychiatric illnesses, with no hearing impairments, unimpaired or corrected vision and taking no medication. Each movie was stopped in 40-50 minute intervals or when participants asked for a break, resulting in 2-6 runs of BOLD-fMRI. A 10 minute high-resolution defaced T1-weighted anatomical MRI scan (MPRAGE) is also provided.
    • The NNDb V2.0 is now on Neuroscout, a platform for fast and flexible re-analysis of (naturalistic) fMRI studies. See: https://neuroscout.org/

    v2.0 Changes

    • Overview
      • We have replaced our own preprocessing pipeline with that implemented in AFNI’s afni_proc.py, thus changing only the derivative files. This introduces a fix for an issue with our normalization (i.e., scaling) step and modernizes and standardizes the preprocessing applied to the NNDb derivative files. We have done a bit of testing and have found that results in both pipelines are quite similar in terms of the resulting spatial patterns of activity but with the benefit that the afni_proc.py results are 'cleaner' and statistically more robust.
    • Normalization

      • Emily Finn and Clare Grall at Dartmouth and Rick Reynolds and Paul Taylor at AFNI, discovered and showed us that the normalization procedure we used for the derivative files was less than ideal for timeseries runs of varying lengths. Specifically, the 3dDetrend flag -normalize makes 'the sum-of-squares equal to 1'. We had not thought through that an implication of this is that the resulting normalized timeseries amplitudes will be affected by run length, increasing as run length decreases (and maybe this should go in 3dDetrend’s help text). To demonstrate this, I wrote a version of 3dDetrend’s -normalize for R so you can see for yourselves by running the following code:
      # Generate a resting state (rs) timeseries (ts)
      # Install / load package to make fake fMRI ts
      # install.packages("neuRosim")
      library(neuRosim)
      # Generate a ts
      ts.rs <- simTSrestingstate(nscan=2000, TR=1, SNR=1)
      # 3dDetrend -normalize
      # R command version for 3dDetrend -normalize -polort 0 which normalizes by making "the sum-of-squares equal to 1"
      # Do for the full timeseries
      ts.normalised.long <- (ts.rs-mean(ts.rs))/sqrt(sum((ts.rs-mean(ts.rs))^2));
      # Do this again for a shorter version of the same timeseries
      ts.shorter.length <- length(ts.normalised.long)/4
      ts.normalised.short <- (ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))/sqrt(sum((ts.rs[1:ts.shorter.length]- mean(ts.rs[1:ts.shorter.length]))^2));
      # By looking at the summaries, it can be seen that the median values become  larger
      summary(ts.normalised.long)
      summary(ts.normalised.short)
      # Plot results for the long and short ts
      # Truncate the longer ts for plotting only
      ts.normalised.long.made.shorter <- ts.normalised.long[1:ts.shorter.length]
      # Give the plot a title
      title <- "3dDetrend -normalize for long (blue) and short (red) timeseries";
      plot(x=0, y=0, main=title, xlab="", ylab="", xaxs='i', xlim=c(1,length(ts.normalised.short)), ylim=c(min(ts.normalised.short),max(ts.normalised.short)));
      # Add zero line
      lines(x=c(-1,ts.shorter.length), y=rep(0,2), col='grey');
      # 3dDetrend -normalize -polort 0 for long timeseries
      lines(ts.normalised.long.made.shorter, col='blue');
      # 3dDetrend -normalize -polort 0 for short timeseries
      lines(ts.normalised.short, col='red');
      
    • Standardization/modernization

      • The above individuals also encouraged us to implement the afni_proc.py script over our own pipeline. It introduces at least three additional improvements: First, we now use Bob’s @SSwarper to align our anatomical files with an MNI template (now MNI152_2009_template_SSW.nii.gz) and this, in turn, integrates nicely into the afni_proc.py pipeline. This seems to result in a generally better or more consistent alignment, though this is only a qualitative observation. Second, all the transformations / interpolations and detrending are now done in fewers steps compared to our pipeline. This is preferable because, e.g., there is less chance of inadvertently reintroducing noise back into the timeseries (see Lindquist, Geuter, Wager, & Caffo 2019). Finally, many groups are advocating using tools like fMRIPrep or afni_proc.py to increase standardization of analyses practices in our neuroimaging community. This presumably results in less error, less heterogeneity and more interpretability of results across studies. Along these lines, the quality control (‘QC’) html pages generated by afni_proc.py are a real help in assessing data quality and almost a joy to use.
    • New afni_proc.py command line

      • The following is the afni_proc.py command line that we used to generate blurred and censored timeseries files. The afni_proc.py tool comes with extensive help and examples. As such, you can quickly understand our preprocessing decisions by scrutinising the below. Specifically, the following command is most similar to Example 11 for ‘Resting state analysis’ in the help file (see https://afni.nimh.nih.gov/pub/dist/doc/program_help/afni_proc.py.html): afni_proc.py \ -subj_id "$sub_id_name_1" \ -blocks despike tshift align tlrc volreg mask blur scale regress \ -radial_correlate_blocks tcat volreg \ -copy_anat anatomical_warped/anatSS.1.nii.gz \ -anat_has_skull no \ -anat_follower anat_w_skull anat anatomical_warped/anatU.1.nii.gz \ -anat_follower_ROI aaseg anat freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI aeseg epi freesurfer/SUMA/aparc.a2009s+aseg.nii.gz \ -anat_follower_ROI fsvent epi freesurfer/SUMA/fs_ap_latvent.nii.gz \ -anat_follower_ROI fswm epi freesurfer/SUMA/fs_ap_wm.nii.gz \ -anat_follower_ROI fsgm epi freesurfer/SUMA/fs_ap_gm.nii.gz \ -anat_follower_erode fsvent fswm \ -dsets media_?.nii.gz \ -tcat_remove_first_trs 8 \ -tshift_opts_ts -tpattern alt+z2 \ -align_opts_aea -cost lpc+ZZ -giant_move -check_flip \ -tlrc_base "$basedset" \ -tlrc_NL_warp \ -tlrc_NL_warped_dsets \ anatomical_warped/anatQQ.1.nii.gz \ anatomical_warped/anatQQ.1.aff12.1D \ anatomical_warped/anatQQ.1_WARP.nii.gz \ -volreg_align_to MIN_OUTLIER \ -volreg_post_vr_allin yes \ -volreg_pvra_base_index MIN_OUTLIER \ -volreg_align_e2a \ -volreg_tlrc_warp \ -mask_opts_automask -clfrac 0.10 \ -mask_epi_anat yes \ -blur_to_fwhm -blur_size $blur \ -regress_motion_per_run \ -regress_ROI_PC fsvent 3 \ -regress_ROI_PC_per_run fsvent \ -regress_make_corr_vols aeseg fsvent \ -regress_anaticor_fast \ -regress_anaticor_label fswm \ -regress_censor_motion 0.3 \ -regress_censor_outliers 0.1 \ -regress_apply_mot_types demean deriv \ -regress_est_blur_epits \ -regress_est_blur_errts \ -regress_run_clustsim no \ -regress_polort 2 \ -regress_bandpass 0.01 1 \ -html_review_style pythonic We used similar command lines to generate ‘blurred and not censored’ and the ‘not blurred and not censored’ timeseries files (described more fully below). We will provide the code used to make all derivative files available on our github site (https://github.com/lab-lab/nndb).

      We made one choice above that is different enough from our original pipeline that it is worth mentioning here. Specifically, we have quite long runs, with the average being ~40 minutes but this number can be variable (thus leading to the above issue with 3dDetrend’s -normalise). A discussion on the AFNI message board with one of our team (starting here, https://afni.nimh.nih.gov/afni/community/board/read.php?1,165243,165256#msg-165256), led to the suggestion that '-regress_polort 2' with '-regress_bandpass 0.01 1' be used for long runs. We had previously used only a variable polort with the suggested 1 + int(D/150) approach. Our new polort 2 + bandpass approach has the added benefit of working well with afni_proc.py.

      Which timeseries file you use is up to you but I have been encouraged by Rick and Paul to include a sort of PSA about this. In Paul’s own words: * Blurred data should not be used for ROI-based analyses (and potentially not for ICA? I am not certain about standard practice). * Unblurred data for ISC might be pretty noisy for voxelwise analyses, since blurring should effectively boost the SNR of active regions (and even good alignment won't be perfect everywhere). * For uncensored data, one should be concerned about motion effects being left in the data (e.g., spikes in the data). * For censored data: * Performing ISC requires the users to unionize the censoring patterns during the correlation calculation. * If wanting to calculate power spectra or spectral parameters like ALFF/fALFF/RSFA etc. (which some people might do for naturalistic tasks still), then standard FT-based methods can't be used because sampling is no longer uniform. Instead, people could use something like 3dLombScargle+3dAmpToRSFC, which calculates power spectra (and RSFC params) based on a generalization of the FT that can handle non-uniform sampling, as long as the censoring pattern is mostly random and, say, only up to about 10-15% of the data. In sum, think very carefully about which files you use. If you find you need a file we have not provided, we can happily generate different versions of the timeseries upon request and can generally do so in a week or less.

    • Effect on results

      • From numerous tests on our own analyses, we have qualitatively found that results using our old vs the new afni_proc.py preprocessing pipeline do not change all that much in terms of general spatial patterns. There is, however, an
  16. f

    Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data...

    • frontiersin.figshare.com
    zip
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao (2023). Data_Sheet_2_NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods.zip [Dataset]. http://doi.org/10.3389/fgene.2019.00400.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Zhenfeng Wu; Weixiang Liu; Xiufeng Jin; Haishuo Ji; Hua Wang; Gustavo Glusman; Max Robinson; Lin Liu; Jishou Ruan; Shan Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.

  17. Annual Survey of State Government Finances 1992-2018

    • search.datacite.org
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Kaplan (2021). Annual Survey of State Government Finances 1992-2018 [Dataset]. http://doi.org/10.3886/e101880
    Explore at:
    Dataset updated
    2021
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    DataCite
    Authors
    Jacob Kaplan
    Description

    Version 4 release notes:Changes release notes description, does not change data.Version 3 release notesAdds 2018 data.Renames some columns so all column names are <= 32 characters to fix Stata limit.
    Version 2 release notesAdds 2017 data. R and Stata files now available.

    The .csv file includes data from the years 1992-2016. No data was changed. Only column names were changed to standardize it across years. Some columns (e.g. Population) that are not in all years are removed. Amounts are in thousands of dollars.
    The zip file includes all raw (completely untouched) files for years 1992-2016.

    From the Census, "The Annual Survey of State Government Finances provides a comprehensive summary of the annual survey findings for state governments, as well as data for individual states. The tables contain detail of revenue by source, expenditure by object and function, indebtedness by term, and assets by purpose." (link to this quote is below)

    Information from the U.S. Census about the data is here. https://www.census.gov/programs-surveys/state/about.html

  18. Water Rights Demand Analysis Methodology Datasets

    • catalog.data.gov
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California State Water Resources Control Board (2024). Water Rights Demand Analysis Methodology Datasets [Dataset]. https://catalog.data.gov/dataset/water-rights-demand-analysis-methodology-datasets-92ed2
    Explore at:
    Dataset updated
    Mar 30, 2024
    Dataset provided by
    California State Water Resources Control Board
    Description

    The following datasets are used for the Water Rights Demand Analysis project and are formatted to be used in the calculations. The State Water Resources Control Board Division of Water Rights (Division) has developed a methodology to standardize and improve the accuracy of water diversion and use data that is used to determine water availability and inform water management and regulatory decisions. The Water Rights Demand Data Analysis Methodology (Methodology https://www.waterboards.ca.gov/drought/drought_tools_methods/demandanalysis.html ) is a series of data pre-processing steps, R Scripts, and data processing modules that identify and help address data quality issues related to both the self-reported water diversion and use data from water right holders or their agents and the Division of Water Rights electronic water rights data.

  19. n

    Dataset: A three-dimensional approach to general plant fire syndromes

    • data.niaid.nih.gov
    zip
    Updated Jan 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pedro Jaureguiberry; Sandra Díaz (2023). Dataset: A three-dimensional approach to general plant fire syndromes [Dataset]. http://doi.org/10.5061/dryad.j6q573njb
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 27, 2023
    Dataset provided by
    Instituto Multidisciplinario de Biología Vegetal (CONICET-Universidad Nacional de Córdoba) and FCEFyN
    Authors
    Pedro Jaureguiberry; Sandra Díaz
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description
    1. Plant fire syndromes are usually defined as combinations of fire response traits, the most common being resprouting (R) and seeding (S). Plant flammability (F), on the other hand, refers to a plant’s effects on communities and ecosystems. Despite its important ecological and evolutionary implications, F has rarely been considered to define plant fire syndromes and, if so, usually separated from response syndromes.
    2. We propose a three-dimensional model that combines R, S and F, encapsulating both plant response to fire regimes and the capacity to promote them. Each axis is divided into three possible standardized categories, reflecting low, medium and high values of each variable, with a total of 27 possible combinations of R, S and F.
    3. We hypothesized that different fire histories should be reflected in the position of species within the three-dimensional space and that this should help assess the importance of fire as an evolutionary force in determining R-S-F syndromes.
    4. To illustrate our approach we compiled information on the fire syndromes of 24 dominant species of different growth forms from the Chaco seasonally-dry forest of central Argentina, and we compared them to 33 species from different Mediterranean-type climate ecosystems (MTCEs) of the world.
    5. Chaco and MTCEs species differed in the range (seven syndromes vs. thirteen syndromes, respectively) and proportion of extreme syndromes (i.e. species with extreme values of R, S and/or F) representing 29% of species in the Chaco vs. 45% in the MTCEs.
    6. Additionally, we explored the patterns of R, S and F of 4032 species from seven regions with contrasting fire histories, and found significantly higher frequencies of extreme values (predominantly high) of all three variables in MTCEs compared to the other regions, where intermediate and low values predominated, broadly supporting our general hypothesis.
    7. The proposed three-dimensional approach should help standardize comparisons of fire syndromes across taxa, growth forms and regions with different fire histories. This will contribute to the understanding of the role of fire in the evolution of plant traits and assist vegetation modelling in the face of changes in fire regimes. Methods Data collection for Chaco species From previous studies, we compiled data on post-fire resprouting (R) (Jaureguiberry 2012; Jaureguibery et al. 2020), germination capacity after heat shock treatments (S) (Jaureguiberry & Díaz) and flammability (F) (Jaureguiberry et al. 2011) of 24 dominant species of the seasonally-dry Chaco forest of central Argentina (hereafter Chaco). We then transformed the original data from the mentioned studies into three possible categorical ordinal values: 1, 2 or 3, indicating low, medium and high values of each variable, respectively. To do so, we used the following criteria: 1) For R data: we focused on the survival percentage recorded for each species (Jaureguiberry et al., 2020) as a proxy for resprouting capacity (Pérez-Harguindeguy et al., 2013). This was because this variable is widely used in fire studies and has a standard scale and range of values, therefore facilitating comparisons between species from different regions. Survival percentages were assigned to one of three possible intervals: 0 to 33 %; 34 to 66 % and from 67 to 100%, and then each interval was assigned the value 1, 2 or 3 respectively, indicating low, medium and high values of resprouting capacity. 2) For S data: based on germination response to heat shock treatments we classified species as heat-sensitive (germination lower than the control), heat-tolerant (germination similar to the control) or heat-stimulated (germination higher than the control) (see details in Jaureguiberry and Díaz 2015). Each of these categories was respectively assigned a value of 1, 2 or 3. 3) For F data: while original measurements included burning rate, maximum temperature and biomass consumed (see details in Jaureguiberry et al. 2011), with the purpose of comparing Chaco species with species from other regions, and considering that burning rate is rarely reported, data of the two latter variables were collected from studies that followed Jaureguiberry et al. (2011). A PCA followed by cluster analysis allowed classifying species into the following categories: 1=low flammability; 2=moderate flammability; and 3=high flammability.

    Data collection for other regions We performed an unstructured literature review of fire-related traits relevant to our model. Whenever possible, we searched for the same or similar variables to those used for the Chaco, namely survival percentage, germination response to heat shock, and variables related to flammability (e.g. maximum temperature, biomass consumed and burning rate), as proxies for R, S and F, respectively. Classification into different R intervals was based either on quantitative data on survival percentage, or on qualitative information from major databases. For example, resprouting capacity reported as “low”, or “high” (e.g. Tavşanoğlu & Pausas, 2018) were assigned R values of 1 and 3, respectively. For Southern Australian species, those reported as “fire killed” and “weak resprouting” (Falster et al., 2021) were assigned a value of 1, while those reported as “intermediate resprouting” and “strong resprouting” were assigned values of 2 and 3, respectively. The vast majority of records in our dataset refer to resprouting of individuals one growing season after the fire. Flammability data for most of the species were based on quantitative measurements that have used the method of Jaureguiberry et al. (2011), which was standardised following the criteria explained earlier. However, for some species, classification was based either on other quantitative measures that followed other methodologies (e.g. measures based on plant parts such as twigs or leaves, or fuel beds) or on qualitative classifications reported in the literature (most of which are in turn based on reviews of quantitative measurements from previous studies). We standardised the original data collected for the other regions following the same approach as for the Chaco. We then built contingency tables to analyse each region and to compare between regions. The curated total number of records from our literature review was 4411 (records for R, S and F, were 3399, 678 and 334, respectively) for 4,032 species (many species had information on two variables, and very few on the three variables). The database covers a wide taxonomic range, encompassing species from approximately 1,250 genera and 180 botanical families, belonging to ten different growth forms, and coming from seven major regions with a wide range of evolutionary histories of fire, from long and intense (Mediterranean-Type Climate Ecosystems) to very recent (New Zealand).

  20. f

    Data_Sheet_2_Best Practice Data Standards for Discrete Chemical...

    • frontiersin.figshare.com
    docx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li-Qing Jiang; Denis Pierrot; Rik Wanninkhof; Richard A. Feely; Bronte Tilbrook; Simone Alin; Leticia Barbero; Robert H. Byrne; Brendan R. Carter; Andrew G. Dickson; Jean-Pierre Gattuso; Dana Greeley; Mario Hoppema; Matthew P. Humphreys; Johannes Karstensen; Nico Lange; Siv K. Lauvset; Ernie R. Lewis; Are Olsen; Fiz F. Pérez; Christopher Sabine; Jonathan D. Sharp; Toste Tanhua; Thomas W. Trull; Anton Velo; Andrew J. Allegra; Paul Barker; Eugene Burger; Wei-Jun Cai; Chen-Tung A. Chen; Jessica Cross; Hernan Garcia; Jose Martin Hernandez-Ayon; Xinping Hu; Alex Kozyr; Chris Langdon; Kitack Lee; Joe Salisbury; Zhaohui Aleck Wang; Liang Xue (2023). Data_Sheet_2_Best Practice Data Standards for Discrete Chemical Oceanographic Observations.docx [Dataset]. http://doi.org/10.3389/fmars.2021.705638.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Li-Qing Jiang; Denis Pierrot; Rik Wanninkhof; Richard A. Feely; Bronte Tilbrook; Simone Alin; Leticia Barbero; Robert H. Byrne; Brendan R. Carter; Andrew G. Dickson; Jean-Pierre Gattuso; Dana Greeley; Mario Hoppema; Matthew P. Humphreys; Johannes Karstensen; Nico Lange; Siv K. Lauvset; Ernie R. Lewis; Are Olsen; Fiz F. Pérez; Christopher Sabine; Jonathan D. Sharp; Toste Tanhua; Thomas W. Trull; Anton Velo; Andrew J. Allegra; Paul Barker; Eugene Burger; Wei-Jun Cai; Chen-Tung A. Chen; Jessica Cross; Hernan Garcia; Jose Martin Hernandez-Ayon; Xinping Hu; Alex Kozyr; Chris Langdon; Kitack Lee; Joe Salisbury; Zhaohui Aleck Wang; Liang Xue
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Effective data management plays a key role in oceanographic research as cruise-based data, collected from different laboratories and expeditions, are commonly compiled to investigate regional to global oceanographic processes. Here we describe new and updated best practice data standards for discrete chemical oceanographic observations, specifically those dealing with column header abbreviations, quality control flags, missing value indicators, and standardized calculation of certain properties. These data standards have been developed with the goals of improving the current practices of the scientific community and promoting their international usage. These guidelines are intended to standardize data files for data sharing and submission into permanent archives. They will facilitate future quality control and synthesis efforts and lead to better data interpretation. In turn, this will promote research in ocean biogeochemistry, such as studies of carbon cycling and ocean acidification, on regional to global scales. These best practice standards are not mandatory. Agencies, institutes, universities, or research vessels can continue using different data standards if it is important for them to maintain historical consistency. However, it is hoped that they will be adopted as widely as possible to facilitate consistency and to achieve the goals stated above.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jessica Joyner (2019). REMNet Tutorial, R Part 5: Normalizing Microbiome Data in R 5.2.19 [Dataset]. http://doi.org/10.25334/M13H-XT81

REMNet Tutorial, R Part 5: Normalizing Microbiome Data in R 5.2.19

Explore at:
Dataset updated
Aug 28, 2019
Dataset provided by
QUBES
Authors
Jessica Joyner
Description

Video on normalizing microbiome data from the Research Experiences in Microbiomes Network

Search
Clear search
Close search
Google apps
Main menu