44 datasets found
  1. e

    Merger of BNV-D data (2008 to 2019) and enrichment

    • data.europa.eu
    zip
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick VINCOURT (2025). Merger of BNV-D data (2008 to 2019) and enrichment [Dataset]. https://data.europa.eu/data/datasets/5f1c3eca9d149439e50c740f?locale=en
    Explore at:
    zip(18530465)Available download formats
    Dataset updated
    Jan 16, 2025
    Dataset authored and provided by
    Patrick VINCOURT
    Description

    Merging (in Table R) data published on https://www.data.gouv.fr/fr/datasets/ventes-de-pesticides-par-departement/, and joining two other sources of information associated with MAs: — uses: https://www.data.gouv.fr/fr/datasets/usages-des-produits-phytosanitaires/ — information on the “Biocontrol” status of the product, from document DGAL/SDQSPV/2020-784 published on 18/12/2020 at https://agriculture.gouv.fr/quest-ce-que-le-biocontrole

    All the initial files (.csv transformed into.txt), the R code used to merge data and different output files are collected in a zip. enter image description here NB: 1) “YASCUB” for {year,AMM,Substance_active,Classification,Usage,Statut_“BioConttrol”}, substances not on the DGAL/SDQSPV list being coded NA. 2) The file of biocontrol products shall be cleaned from the duplicates generated by the marketing authorisations leading to several trade names.
    3) The BNVD_BioC_DY3 table and the output file BNVD_BioC_DY3.txt contain the fields {Code_Region,Region,Dept,Code_Dept,Anne,Usage,Classification,Type_BioC,Quantite_substance)}

  2. Scripts for Analysis

    • figshare.com
    txt
    Updated Jul 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sneddon Lab UCSF (2018). Scripts for Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.6783569.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 18, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sneddon Lab UCSF
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Scripts used for analysis of V1 and V2 Datasets.seurat_v1.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, PCA analysis, clustering, tSNE visualization. Used for v1 datasets. merge_seurat.R - merge two or more seurat objects into one seurat object. Perform linear regression to remove batch effects from separate objects. Used for v1 datasets. subcluster_seurat_v1.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA. Used for v1 datasets.seurat_v2.R - initialize seurat object from 10X Genomics cellranger outputs. Includes filtering, normalization, regression, variable gene identification, and PCA analysis. Used for v2 datasets. clustering_markers_v2.R - clustering and tSNE visualization for v2 datasets. subcluster_seurat_v2.R - subcluster clusters of interest from Seurat object. Determine variable genes, perform regression and PCA analysis. Used for v2 datasets.seurat_object_analysis_v1_and_v2.R - downstream analysis and plotting functions for seurat object created by seurat_v1.R or seurat_v2.R. merge_clusters.R - merge clusters that do not meet gene threshold. Used for both v1 and v2 datasets. prepare_for_monocle_v1.R - subcluster cells of interest and perform linear regression, but not scaling in order to input normalized, regressed values into monocle with monocle_seurat_input_v1.R monocle_seurat_input_v1.R - monocle script using seurat batch corrected values as input for v1 merged timecourse datasets. monocle_lineage_trace.R - monocle script using nUMI as input for v2 lineage traced dataset. monocle_object_analysis.R - downstream analysis for monocle object - BEAM and plotting. CCA_merging_v2.R - script for merging v2 endocrine datasets with canonical correlation analysis and determining the number of CCs to include in downstream analysis. CCA_alignment_v2.R - script for downstream alignment, clustering, tSNE visualization, and differential gene expression analysis.

  3. Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Dec 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dylan Westfall; Mullins James (2023). Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies [Dataset]. http://doi.org/10.5061/dryad.w3r2280w0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 7, 2023
    Dataset provided by
    HIV Vaccine Trials Networkhttp://www.hvtn.org/
    HIV Prevention Trials Networkhttp://www.hptn.org/
    National Institute of Allergy and Infectious Diseaseshttp://www.niaid.nih.gov/
    PEPFAR
    Authors
    Dylan Westfall; Mullins James
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence PCR amplicons derived from cDNA templates tagged with universal molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR and the use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Handling of the large datasets produced from SMRT-UMI sequencing was facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline), that automatically filters and parses reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination or early cycle PCR errors, resulting in highly accurate sequence datasets. The optimized SMRT-UMI sequencing method presented here represents a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus (HIV) quasispecies. Methods This serves as an overview of the analysis performed on PacBio sequence data that is summarized in Analysis Flowchart.pdf and was used as primary data for the paper by Westfall et al. "Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies" Five different PacBio sequencing datasets were used for this analysis: M027, M2199, M1567, M004, and M005 For the datasets which were indexed (M027, M2199), CCS reads from PacBio sequencing files and the chunked_demux_config files were used as input for the chunked_demux pipeline. Each config file lists the different Index primers added during PCR to each sample. The pipeline produces one fastq file for each Index primer combination in the config. For example, in dataset M027 there were 3–4 samples using each Index combination. The fastq files from each demultiplexed read set were moved to the sUMI_dUMI_comparison pipeline fastq folder for further demultiplexing by sample and consensus generation with that pipeline. More information about the chunked_demux pipeline can be found in the README.md file on GitHub. The demultiplexed read collections from the chunked_demux pipeline or CCS read files from datasets which were not indexed (M1567, M004, M005) were each used as input for the sUMI_dUMI_comparison pipeline along with each dataset's config file. Each config file contains the primer sequences for each sample (including the sample ID block in the cDNA primer) and further demultiplexes the reads to prepare data tables summarizing all of the UMI sequences and counts for each family (tagged.tar.gz) as well as consensus sequences from each sUMI and rank 1 dUMI family (consensus.tar.gz). More information about the sUMI_dUMI_comparison pipeline can be found in the paper and the README.md file on GitHub. The consensus.tar.gz and tagged.tar.gz files were moved from sUMI_dUMI_comparison pipeline directory on the server to the Pipeline_Outputs folder in this analysis directory for each dataset and appended with the dataset name (e.g. consensus_M027.tar.gz). Also in this analysis directory is a Sample_Info_Table.csv containing information about how each of the samples was prepared, such as purification methods and number of PCRs. There are also three other folders: Sequence_Analysis, Indentifying_Recombinant_Reads, and Figures. Each has an .Rmd file with the same name inside which is used to collect, summarize, and analyze the data. All of these collections of code were written and executed in RStudio to track notes and summarize results. Sequence_Analysis.Rmd has instructions to decompress all of the consensus.tar.gz files, combine them, and create two fasta files, one with all sUMI and one with all dUMI sequences. Using these as input, two data tables were created, that summarize all sequences and read counts for each sample that pass various criteria. These are used to help create Table 2 and as input for Indentifying_Recombinant_Reads.Rmd and Figures.Rmd. Next, 2 fasta files containing all of the rank 1 dUMI sequences and the matching sUMI sequences were created. These were used as input for the python script compare_seqs.py which identifies any matched sequences that are different between sUMI and dUMI read collections. This information was also used to help create Table 2. Finally, to populate the table with the number of sequences and bases in each sequence subset of interest, different sequence collections were saved and viewed in the Geneious program. To investigate the cause of sequences where the sUMI and dUMI sequences do not match, tagged.tar.gz was decompressed and for each family with discordant sUMI and dUMI sequences the reads from the UMI1_keeping directory were aligned using geneious. Reads from dUMI families failing the 0.7 filter were also aligned in Genious. The uncompressed tagged folder was then removed to save space. These read collections contain all of the reads in a UMI1 family and still include the UMI2 sequence. By examining the alignment and specifically the UMI2 sequences, the site of the discordance and its case were identified for each family as described in the paper. These alignments were saved as "Sequence Alignments.geneious". The counts of how many families were the result of PCR recombination were used in the body of the paper. Using Identifying_Recombinant_Reads.Rmd, the dUMI_ranked.csv file from each sample was extracted from all of the tagged.tar.gz files, combined and used as input to create a single dataset containing all UMI information from all samples. This file dUMI_df.csv was used as input for Figures.Rmd. Figures.Rmd used dUMI_df.csv, sequence_counts.csv, and read_counts.csv as input to create draft figures and then individual datasets for eachFigure. These were copied into Prism software to create the final figures for the paper.

  4. m

    R codes and dataset for Visualisation of Diachronic Constructional Change...

    • bridges.monash.edu
    • researchdata.edu.au
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Primahadi Wijaya Rajeg (2023). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Monash University
    Authors
    Gede Primahadi Wijaya Rajeg
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    PublicationPrimahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387Description of R codes and data files in the repositoryThis repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt. Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.

  5. r

    YahooR3 and Coat datasets

    • resodate.org
    • service.tib.eu
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S.M.F. Sani; Seyed Abbas Hosseini; Hamid R. Rabiee (2025). YahooR3 and Coat datasets [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQveWFob29yMy1hbmQtY29hdC1kYXRhc2V0cw==
    Explore at:
    Dataset updated
    Jan 3, 2025
    Dataset provided by
    Leibniz Data Manager
    Authors
    S.M.F. Sani; Seyed Abbas Hosseini; Hamid R. Rabiee
    Description

    The dataset used in the paper is a combination of two datasets: YahooR3 and Coat. The dataset is used to evaluate the performance of the proposed Epsilon Non-Greedy framework.

  6. Cleaned NHANES 1988-2018

    • figshare.com
    txt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

  7. Data and tools for studying isograms

    • figshare.com
    Updated Jul 31, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Breit (2017). Data and tools for studying isograms [Dataset]. http://doi.org/10.6084/m9.figshare.5245810.v1
    Explore at:
    application/x-sqlite3Available download formats
    Dataset updated
    Jul 31, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Florian Breit
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A collection of datasets and python scripts for extraction and analysis of isograms (and some palindromes and tautonyms) from corpus-based word-lists, specifically Google Ngram and the British National Corpus (BNC).Below follows a brief description, first, of the included datasets and, second, of the included scripts.1. DatasetsThe data from English Google Ngrams and the BNC is available in two formats: as a plain text CSV file and as a SQLite3 database.1.1 CSV formatThe CSV files for each dataset actually come in two parts: one labelled ".csv" and one ".totals". The ".csv" contains the actual extracted data, and the ".totals" file contains some basic summary statistics about the ".csv" dataset with the same name.The CSV files contain one row per data point, with the colums separated by a single tab stop. There are no labels at the top of the files. Each line has the following columns, in this order (the labels below are what I use in the database, which has an identical structure, see section below):

    Label Data type Description

    isogramy int The order of isogramy, e.g. "2" is a second order isogram

    length int The length of the word in letters

    word text The actual word/isogram in ASCII

    source_pos text The Part of Speech tag from the original corpus

    count int Token count (total number of occurences)

    vol_count int Volume count (number of different sources which contain the word)

    count_per_million int Token count per million words

    vol_count_as_percent int Volume count as percentage of the total number of volumes

    is_palindrome bool Whether the word is a palindrome (1) or not (0)

    is_tautonym bool Whether the word is a tautonym (1) or not (0)

    The ".totals" files have a slightly different format, with one row per data point, where the first column is the label and the second column is the associated value. The ".totals" files contain the following data:

    Label

    Data type

    Description

    !total_1grams

    int

    The total number of words in the corpus

    !total_volumes

    int

    The total number of volumes (individual sources) in the corpus

    !total_isograms

    int

    The total number of isograms found in the corpus (before compacting)

    !total_palindromes

    int

    How many of the isograms found are palindromes

    !total_tautonyms

    int

    How many of the isograms found are tautonyms

    The CSV files are mainly useful for further automated data processing. For working with the data set directly (e.g. to do statistics or cross-check entries), I would recommend using the database format described below.1.2 SQLite database formatOn the other hand, the SQLite database combines the data from all four of the plain text files, and adds various useful combinations of the two datasets, namely:• Compacted versions of each dataset, where identical headwords are combined into a single entry.• A combined compacted dataset, combining and compacting the data from both Ngrams and the BNC.• An intersected dataset, which contains only those words which are found in both the Ngrams and the BNC dataset.The intersected dataset is by far the least noisy, but is missing some real isograms, too.The columns/layout of each of the tables in the database is identical to that described for the CSV/.totals files above.To get an idea of the various ways the database can be queried for various bits of data see the R script described below, which computes statistics based on the SQLite database.2. ScriptsThere are three scripts: one for tiding Ngram and BNC word lists and extracting isograms, one to create a neat SQLite database from the output, and one to compute some basic statistics from the data. The first script can be run using Python 3, the second script can be run using SQLite 3 from the command line, and the third script can be run in R/RStudio (R version 3).2.1 Source dataThe scripts were written to work with word lists from Google Ngram and the BNC, which can be obtained from http://storage.googleapis.com/books/ngrams/books/datasetsv2.html and [https://www.kilgarriff.co.uk/bnc-readme.html], (download all.al.gz).For Ngram the script expects the path to the directory containing the various files, for BNC the direct path to the *.gz file.2.2 Data preparationBefore processing proper, the word lists need to be tidied to exclude superfluous material and some of the most obvious noise. This will also bring them into a uniform format.Tidying and reformatting can be done by running one of the following commands:python isograms.py --ngrams --indir=INDIR --outfile=OUTFILEpython isograms.py --bnc --indir=INFILE --outfile=OUTFILEReplace INDIR/INFILE with the input directory or filename and OUTFILE with the filename for the tidied and reformatted output.2.3 Isogram ExtractionAfter preparing the data as above, isograms can be extracted from by running the following command on the reformatted and tidied files:python isograms.py --batch --infile=INFILE --outfile=OUTFILEHere INFILE should refer the the output from the previosu data cleaning process. Please note that the script will actually write two output files, one named OUTFILE with a word list of all the isograms and their associated frequency data, and one named "OUTFILE.totals" with very basic summary statistics.2.4 Creating a SQLite3 databaseThe output data from the above step can be easily collated into a SQLite3 database which allows for easy querying of the data directly for specific properties. The database can be created by following these steps:1. Make sure the files with the Ngrams and BNC data are named “ngrams-isograms.csv” and “bnc-isograms.csv” respectively. (The script assumes you have both of them, if you only want to load one, just create an empty file for the other one).2. Copy the “create-database.sql” script into the same directory as the two data files.3. On the command line, go to the directory where the files and the SQL script are. 4. Type: sqlite3 isograms.db 5. This will create a database called “isograms.db”.See the section 1 for a basic descript of the output data and how to work with the database.2.5 Statistical processingThe repository includes an R script (R version 3) named “statistics.r” that computes a number of statistics about the distribution of isograms by length, frequency, contextual diversity, etc. This can be used as a starting point for running your own stats. It uses RSQLite to access the SQLite database version of the data described above.

  8. BRAINTEASER ALS and MS Datasets

    • data.europa.eu
    • data.niaid.nih.gov
    unknown
    Updated Jul 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). BRAINTEASER ALS and MS Datasets [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-14857741?locale=lv
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    Description

    BRAINTEASER (Bringing Artificial Intelligence home for a better care of amyotrophic lateral sclerosis and multiple sclerosis) is a data science project that seeks to exploit the value of big data, including those related to health, lifestyle habits, and environment, to support patients with Amyotrophic Lateral Sclerosis (ALS) and Multiple Sclerosis (MS) and their clinicians. Taking advantage of cost-efficient sensors and apps, BRAINTEASER will integrate large, clinical datasets that host both patient-generated and environmental data. As part of its activities, BRAINTEASER organized three open evaluation challenges on Intelligent Disease Progression Prediction (iDPP), iDPP@CLEF 2022, iDPP@CLEF 2023, and iDPP@CLEF 2024 co-located with the Conference and Labs of the Evaluation Forum (CLEF). The goal of iDPP@CLEF is to design and develop an evaluation infrastructure for AI algorithms able to: better describe disease mechanisms; stratify patients according to their phenotype assessed all over the disease evolution; predict disease progression in a probabilistic, time-dependent fashion. The iDPP@CLEF challenges relied on retrospective and prospective ALS and MS patient data made available by the clinical partners of the BRAINTEASER consortium. Retrospective Dataset We release three retrospective datasets, one for ALS and two for MS. The two retrospective MS datasets, one consisting of clinical data only and one with clinical data and environmental/pollution data. The retrospective datasets contain data about 2,204 ALS patients (static variables, ALSFRS-R questionnaires, spirometry tests, environmental/pollution data) and 1,792 MS patients (static variables, EDSS scores, evoked potentials, relapses, MRIs). A subset of 280 MS patients contains environmental and pollution data. More in detail, the BRAINTEASER project retrospective datasets were derived from the merging of already existing datasets obtained by the clinical centers involved in the BRAINTEASER Project. The ALS dataset was obtained by the merge and homogenisation of the Piemonte and Valle d’Aosta Registry for Amyotrophic Lateral Sclerosis (PARALS, Chiò et al., 2017) and the Lisbon ALS clinic (CENTRO ACADÉMICO DE MEDICINA DE LISBOA, Centro Hospitalar Universitário de Lisboa-Norte, Hospital de Santa Maria, Lisbon, Portugal,) dataset. Both datasets were initiated in 1995 and are currently maintained by researchers of the ALS Regional Expert Centre (CRESLA), University of Turin, and of the CENTRO ACADÉMICO DE MEDICINA DE LISBOA-Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa. They include demographic and clinical data, comprehending both static and dynamic variables. The MS dataset was obtained from the Pavia MS clinical dataset, which was started in 1990 and contains demographic and clinical information that is continuously updated by the researchers of the Institute and the Turin MS clinic dataset (Department of Neurosciences and Mental Health, Neurology Unit 1, Città della Salute e della Scienza di Torino. Retrospective environmental data are accessible at various scales at the individual subject level. Thus, environmental data have been retrieved at different scales: To gather macroscale air pollution data we’ve leveraged data coming from public monitoring stations that cover the whole extension of the involved countries, namely the European Air Quality Portal; data from a network of air quality sensors (PurpleAir - Outdoor Air Quality Monitor / PurpleAir PA-II) installed in different points of the city of Pavia (Italy) were extracted as well. In both cases, environmental data were previously publicly available. In order to merge environmental data with individual subject locations we leverage postcodes (postcodes of the station for the pollutant detection and postcodes of subject address). Data were merged following an anonymization procedure based on hash keys. Environmental exposure trajectories have been pre-processed and aggregated in order to avoid fine temporal and spatial granularities. Thus, individual exposure information could not disclose personal addresses. The retrospective datasets are shared in two formats: RDF (serialized in Turtle) modeled according to the BRAINTEASER Ontology (BTO); CSV, as shared during the iDPP@CLEF 2022 and 2023 challenges, split into training and test. Each format corresponds to a specific folder in the datasets, where a dedicated README file provides further details on the datasets. Note that the ALS dataset is split into multiple ZIP files due to the size of the environmental data. Prospective Dataset For the iDPP@CLEF 2024 challenge, the datasets contain prospective data about 86 ALS patients (static variables, ALSFRS-R questionnaires compiled by clinicians or patients using the BRAINTEASER mobile application, sensors data). The prospective datasets are shared in two formats: RDF (serialized in Turtle) modeled according to the BRAINTEASER Ontology (BTO); CSV, as shared durin

  9. Datasets and R code

    • figshare.com
    txt
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Auriane Le Floch (2025). Datasets and R code [Dataset]. http://doi.org/10.6084/m9.figshare.28269875.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Auriane Le Floch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Investigating how non-human animals produce call sequences offers valuable insights into the evolutionary processes underlying meaning generation through vocal communication, including the origins of syntax. While a wide range of species combine calls into larger structures, often following specific rules, most studies focus only on one or a few sequences per species. This limits our understanding of animal abilities to combine calls and their potential to convey meaning through sequences. Our study addresses this gap by documenting the vocal sequence repertoire and their underlying rules in sooty mangabeys (Cercocebus atys), a West African forest-dwelling monkey species. Over ten months, we collected data on two groups of wild sooty mangabeys in the Taï National Park, Ivory Coast. We recorded and annotated 1,672 recordings. We show that sooty mangabeys combine most of their calls, though they rely on a limited set of sequences. Within common sequences, we identified rules of call ordering and reoccurrence, as well as hierarchical structures. Interestingly, sooty mangabeys produced hierarchically structured sequences using only two call types, potentially generating a wide range of meanings. Our findings suggest that sooty mangabeys use both structured and unstructured sequences, each likely serving to convey specific information. While context of production, not addressed here, is essential for understanding the precise meaning of vocal utterances, our results underline the importance of a whole-repertoire approach in assessing the diversity of rule-based sequences, and hence the potential a vocal system has to expand meanings beyond the number of vocalisations in the repertoire.

  10. d

    Data from: Cultivar resistance to common scab disease of potato is dependent...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Cultivar resistance to common scab disease of potato is dependent on the pathogen species [Dataset]. https://catalog.data.gov/dataset/data-from-cultivar-resistance-to-common-scab-disease-of-potato-is-dependent-on-the-pathoge-53c3e
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    All data from the paper "Cultivar resistance to common scab disease of potato is dependent on the pathogen species." Three separate datasets are included: A csv file with the disease severity of three common scab pathogens across 55 different potato cultivars in a greenhouse pot assay (Figures 2-5 in the associated paper). The included R script was used with this data to perform the ANOVA for the data from the greenhouse pot assay (Table 2 in the associated paper). This script can be used in R for any similar dataset to calculate the significance and percent of total variation for any number of user-defined fixed effects. A zipped file with all of the qPCR data for the expression of the txtAB genes (Figure 6 in the associated paper). An Excel file with the HPLC data for making the thaxtomin detection standard curve and quantifying the amount of thaxtomin in the test sample. Resources in this dataset:Resource Title: Streptomyces pot assay data. File Name: 18.4.2updatedfileAllDataPotAssay.csvResource Description: Combined data from all Streptomyces - potato pot assays from the paper "Cultivar resistance to common scab disease of potato is dependent on the pathogen species." This csv file can be used with the example R script "DiseaseseverityEstimateScript."Resource Title: Combined qPCR data.. File Name: CombinedtxtABqPCRresults.zipResource Description: Zipped file that contains all qPCR data of txtAB gene expression in all experimental conditions. Combined qPCR data from Figure 6 of the paper "Cultivar resistance to common scab disease of potato is dependent on the pathogen species."Resource Title: R script for estimating disease severity. File Name: DiseaseSeverityEstimateScript.txtResource Description: R script used in combination with the "18.4.2updatedfileAllDataPotAssay.csv" file for generating the disease severity estimates (Figures 2-4) in the paper "Combined qPCR data from Figure 6 of the paper "Cultivar resistance to common scab disease of potato is dependent on the pathogen species."Resource Title: Thaxtomin standard curve and quantification - All data. File Name: Thaxtomin_CalCurve_log_log-Scale_12072018 (003).xlsxResource Description: Excel file with two sheets. The first sheet is all of the HPLC data used for calculating the standard curve of thaxtomin using known standards. The second sheet is the quantification data for the abundance of thaxtomin across the experimental groups. Data presented as Figure 6 in the paper "Combined qPCR data from Figure 6 of the paper "Cultivar resistance to common scab disease of potato is dependent on the pathogen species."

  11. Z

    2023_ Datasets and R source code of "Allometry Bird Mitochondrial...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2023). 2023_ Datasets and R source code of "Allometry Bird Mitochondrial Bioenergetics" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8355415
    Explore at:
    Dataset updated
    Sep 19, 2023
    Dataset authored and provided by
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    METHODOLOGICAL INFORMATION

    This dataset contains mitochondrial bioenergetic data and enzymatic activities of our studies from two tissues: skeletal and cardiac muscles, of 13 bird species ranging from 15 g to 160 kg. Methodology: mitochondrial isolation, respiration, enzyme assays, measurement of body mass All analyses were performed in R version 4.2.1 (R Core Team 2022), using phylogenetic comparative analyses.

    Description of the Data and file structure "2023_Data_Allometry Bird mitochondrial bioenergetics"

    The file contains two sheets: one for the skeletal muscle data and the second for the cardiac muscle. For each section you will find: the name of the species studied, the number of individuals, their body mass (in grams), mitochondrial flux measurements (oxygen consumption, ATP synthesis, ROS generation), ratios (RCR, Slope, Mitochondrial efficiency ATP/O...) and enzymatic activity measurements.

    Missing data correspond to individuals for whom we were unable to collect data (e.g. no heart samples, not enough tissue for analysis...)

    Phylogenetic tree " BirdTree_MCMCglmm "

    The phylogenetic tree combining the 13 species studied was obtained from the BirdTree.org website (Rubolini et al., 2015).The tree source used was Hackett Sequenced Species: a set of 10 000 trees with 6670 OTUs each (Hackett et al., 2008). We performed 1000 simulations to create the most parsimonious tree. The avian tree was summarized using BEAST (v1.10.4, 2002-2018) to create a target tree usable in nexus format in R version 4.2.1 (R Core Team 2022). The parameters used were: burnin as a number of trees (100), maximum clade credibility tree as target tree type, and common ancestor heights.

  12. ProjecTILs murine reference atlas of tumor-infiltrating T cells, version 1

    • figshare.com
    application/gzip
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massimo Andreatta; Santiago Carmona (2023). ProjecTILs murine reference atlas of tumor-infiltrating T cells, version 1 [Dataset]. http://doi.org/10.6084/m9.figshare.12478571.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 29, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Massimo Andreatta; Santiago Carmona
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We have developed ProjecTILs, a computational approach to project new data sets into a reference map of T cells, enabling their direct comparison in a stable, annotated system of coordinates. Because new cells are embedded in the same space of the reference, ProjecTILs enables the classification of query cells into annotated, discrete states, but also over a continuous space of intermediate states. By comparing multiple samples over the same map, and across alternative embeddings, the method allows exploring the effect of cellular perturbations (e.g. as the result of therapy or genetic engineering) and identifying genetic programs significantly altered in the query compared to a control set or to the reference map. We illustrate the projection of several data sets from recent publications over two cross-study murine T cell reference atlases: the first describing tumor-infiltrating T lymphocytes (TILs), the second characterizing acute and chronic viral infection.To construct the reference TIL atlas, we obtained single-cell gene expression matrices from the following GEO entries: GSE124691, GSE116390, GSE121478, GSE86028; and entry E-MTAB-7919 from Array-Express. Data from GSE124691 contained samples from tumor and from tumor-draining lymph nodes, and were therefore treated as two separate datasets. For the TIL projection examples (OVA Tet+, miR-155 KO and Regnase-KO), we obtained the gene expression counts from entries GSE122713, GSE121478 and GSE137015, respectively.Prior to dataset integration, single-cell data from individual studies were filtered using TILPRED-1.0 (https://github.com/carmonalab/TILPRED), which removes cells not enriched in T cell markers (e.g. Cd2, Cd3d, Cd3e, Cd3g, Cd4, Cd8a, Cd8b1) and cells enriched in non T cell genes (e.g. Spi1, Fcer1g, Csf1r, Cd19). Dataset integration was performed using STACAS (https://github.com/carmonalab/STACAS), a batch-correction algorithm based on Seurat 3. For the TIL reference map, we specified 600 variable genes per dataset, excluding cell cycling genes, mitochondrial, ribosomal and non-coding genes, as well as genes expressed in less than 0.1% or more than 90% of the cells of a given dataset. For integration, a total of 800 variable genes were derived as the intersection of the 600 variable genes of individual datasets, prioritizing genes found in multiple datasets and, in case of draws, those derived from the largest datasets. We determined pairwise dataset anchors using STACAS with default parameters, and filtered anchors using an anchor score threshold of 0.8. Integration was performed using the IntegrateData function in Seurat3, providing the anchor set determined by STACAS, and a custom integration tree to initiate alignment from the largest and most heterogeneous datasets.Next, we performed unsupervised clustering of the integrated cell embeddings using the Shared Nearest Neighbor (SNN) clustering method implemented in Seurat 3 with parameters {resolution=0.6, reduction=”umap”, k.param=20}. We then manually annotated individual clusters (merging clusters when necessary) based on several criteria: i) average expression of key marker genes in individual clusters; ii) gradients of gene expression over the UMAP representation of the reference map; iii) gene-set enrichment analysis to determine over- and under- expressed genes per cluster using MAST. In order to have access to predictive methods for UMAP, we recomputed PCA and UMAP embeddings independently of Seurat3 using respectively the prcomp function from basic R package “stats”, and the “umap” R package (https://github.com/tkonopka/umap).

  13. H

    data set for EC 30 mins for the two sites

    • dataverse.harvard.edu
    Updated Oct 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Benavides (2025). data set for EC 30 mins for the two sites [Dataset]. http://doi.org/10.7910/DVN/BBKIOQ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Juan Benavides
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Dataset Description This dataset and accompanying R scripts support the intensive carbon dynamics observation platform conducted in tropical alpine peatlands of Guatavita, Colombia. The data include half-hourly and cumulative greenhouse-gas fluxes (CO₂, CH₄, N₂O), dissolved organic carbon (DOC) transport, and related hydrological and meteorological measurements, together with model outputs and analysis scripts. All analyses were performed in R (version ≥ 4.2). The repository is organized into two main components: Chamber and Bayesian analysis pipeline (root folder) Tower flux gap-filling and uncertainty analysis (folder golden/) 1. Chamber and Bayesian Workflow This section integrates chamber measurements, water-table data, and modeled fluxes for both conserved and degraded peatland plots. The scripts allow data preparation, prediction of half-hourly fluxes, Bayesian partitioning of net ecosystem exchange (NEE) into gross primary production (GPP) and ecosystem respiration (ER), and generation of publication-quality figures. Main steps: Data preparation – Cleaning and merging chamber and tower data (flux_chamber3.r, flux_wt_guatavita_jc.r, waterlevel.r). Prediction dataset construction – Builds model input datasets (flux predict.R, flux predict2.R). Bayesian flux partitioning – Separates NEE into GPP and ER using hierarchical Bayesian models (bayesian models.r, bayesianflux.r). This step must be run separately for each station (ST1 and ST2) by modifying the station code inside the scripts. Trace gas analyses – Quantifies N₂O and DOC fluxes (N2Oflux.r, DOC_flux.r). Visualization and summaries – Produces the cumulative and seasonal flux figures and summary tables (final plot.r). Primary outputs: Modelled CO₂ and CH₄ fluxes (*_Model_EC_long.csv, _pred_30min_.csv) Seasonal and cumulative carbon balance summaries (Final_Cumulative_CO2_CH4_CO2eq_2023_2024_bySeason_Method_Station.csv, Summary_CO2_CH4_CO2eq_byMethod_Station_Season_Year.csv) Mean and confidence-interval tables for each gas (PerGas_CO2_CH4_with_CO2eq_Mg_ha_mean95CI.csv, Totals_CO2eq_across_gases_Mg_ha_mean95CI.csv) Publication figures (figure.png, figure_transparent.png, figure.svg) 2. Tower Flux (Eddy-Covariance) Workflow The folder golden/ contains the workflow used for tower-based fluxes, including gap-filling, uncertainty analysis, and manuscript-quality visualization. These scripts use the REddyProc R package and standard meteorological variables. Scripts: REddyProc_Guatavita_Station1_Gold.R – Gap-filling for Station 1 REddyProc_Guatavita_Station2_Gold.R – Gap-filling for Station 2 Guatavita_gapfilling_uncertainty.R – Quantifies gap-filling uncertainty Guatavita_plot_manuscript.R – Generates final tower flux figures Each station’s eddy-covariance data were processed independently following standard u-star filtering and uncertainty propagation routines. Data Files Input data include chamber fluxes (co2flux.csv, ch4flux.csv, db_gutavita_N2O_all.csv), water-table and hydrological measurements (WaterTable.csv, wtd_martos_21_25.csv), DOC transport (DOC transport.csv), and auxiliary meteorological variables (tower_var.csv). Intermediate model results are stored in .rds files, and cumulative or seasonal summaries are provided in .csv and .xlsx formats. Reproducibility Notes All scripts assume relative paths from the project root. To reproduce the complete analyses: Install required R packages (tidyverse, ggplot2, rjags, coda, REddyProc, among others). Run the chamber workflow in the order listed above. Repeat the Bayesian modeling step for both stations. Execute the tower scripts in the golden/ folder for gap-filling and visualization. Large intermediate .rds files are retained for reproducibility and should not be deleted unless re-running the models from scratch. Citation and Contact Principal Investigator: Juan C. Benavides, Pontificia Universidad Javeriana, Bogotá, Colombia

  14. Clustering of samples and variables with mixed-type data

    • plos.figshare.com
    tiff
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuela Hummel; Dominic Edelmann; Annette Kopp-Schneider (2023). Clustering of samples and variables with mixed-type data [Dataset]. http://doi.org/10.1371/journal.pone.0188274
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Manuela Hummel; Dominic Edelmann; Annette Kopp-Schneider
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix.

  15. s

    Spatial Multimodal Analysis (SMA) - Spatial Transcriptomics

    • figshare.scilifelab.se
    • demo.researchdata.se
    • +1more
    json
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Vicari; Reza Mirzazadeh; Anna Nilsson; Patrik Bjärterot; Ludvig Larsson; Hower Lee; Mats Nilsson; Julia Foyer; Markus Ekvall; Paulo Czarnewski; Xiaoqun Zhang; Per Svenningsson; Per Andrén; Lukas Käll; Joakim Lundeberg (2025). Spatial Multimodal Analysis (SMA) - Spatial Transcriptomics [Dataset]. http://doi.org/10.17044/scilifelab.22778920.v1
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    KTH Royal Institute of Technology, Science for Life Laboratory
    Authors
    Marco Vicari; Reza Mirzazadeh; Anna Nilsson; Patrik Bjärterot; Ludvig Larsson; Hower Lee; Mats Nilsson; Julia Foyer; Markus Ekvall; Paulo Czarnewski; Xiaoqun Zhang; Per Svenningsson; Per Andrén; Lukas Käll; Joakim Lundeberg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains Spatial Transcriptomics (ST) data matching with Matrix Assisted Laser Desorption/Ionization - Mass Spetrometry Imaging (MALDI-MSI). This data is complementary to data contained in the same project. FIles with the same identifiers in the two datasets originated from the very same tissue section and can be combined in a multimodal ST-MSI object. For more information about the dataset please see our manuscript posted on BioRxiv (doi: https://doi.org/10.1101/2023.01.26.525195). This dataset includes ST data from 19 tissue sections, including human post-mortem and mouse samples. The spatial transcriptomics data was generated using the Visium protocol (10x Genomics). The murine tissue sections come from three different mice unilaterally injected with 6-OHDA. 6-OHDA is a neurotoxin that when injected in the brain can selectively destroy dopaminergic neurons. We used this mouse model to show the applicability of the technology that we developed, named Spatial Multimodal Analysis (SMA). Using our technology on these mouse brain tissue sections we were able to detect both dopamine with MALDI-MSI and the corresponding gene expression with ST. This dataset includes also one human post-mortem striatum sample that was placed on one Visium slide across the four capture areas. This sample was analyzed with a different ST protocol named RRST (Mirzazadeh, R., Andrusivova, Z., Larsson, L. et al. Spatially resolved transcriptomic profiling of degraded and challenging fresh frozen samples. Nat Commun 14, 509 (2023). https://doi.org/10.1038/s41467-023-36071-5), where probes capturing the whole transcriptome are first hybridized in the tissue section and then spatially detected. Each tissue section contained in the dataset has been given a unique identifier that is composed of the Visium array ID and capture area ID of the Visium slide that the tissue section was placed on. This unique identifier is included in the file names of all the files relative to the same tissue section, including the MALDI-MSI files published in the other dataset included in this project. In this dataset you will find the following files for each tissue section: - raw files: these are the read one fastq files (containing the pattern *R1*fastq.gz in the file name), read two fastq files (containing the pattern *R1*fastq.gz in the file name) and the raw microscope images (containing the pattern Spot.jpg in the file name). These are the only files needed to run the Space Ranger pipeline, which is freely available for any user (please see the 10x Genomics website for information on how to install and run Space Ranger); - processed data files: we provide processed data files of two types: a) Space Ranger outputs that were used to produce the figures in our publication; b) manual annotation tables in csv format produced using Loupe Browser 6 (csv tables with file names ending _RegionLoupe.csv, _filter.csv, _dopamine.csv, _lesion.csv, _region.csv patterns); c) json files that we used as input for Space Ranger in the cases where the automatic tissue detection included in the pipeline failed to recognize the tissue or the fiducials. Using these processed files the user can reproduce the figures of our publication without having to restart from the raw data files. The MALDI-MSI analyses preceding ST was performed with different matrices in different tissue section. We used 1) 9-aminoacridine (9-AA) for detection of metabolites in negative ionization mode, 2) 2,5-dihydroxybenzoic acid (DHB) for detection of metabolites in positive ionization mode, 3) 4-(anthracen-9-yl)-2-fluoro-1-ethylpyridin-1-ium iodide (FMP-10), which charge-tags molecules with phenolic hydroxyls and/or primary amines, including neurotransmitters. The information about which matrix was sprayed on the tissue sections and other information about the samples is included in the metadata table. We also used three types of control samples: - standard Visium: samples processed with standard Visium (i.e. no matrix spraying, no MALDI-MSI, protocol as recommended by 10x Gemomics with no exeptions) - internal controls (iCTRL): samples not sprayed with any matrix, neither processed with MALDI-MSI, but located on the same Visium slide were other samples were processed with MALDI-MSI - FMP-10-iCTRL: sample sprayed with FMP-10, and then processed as an iCTRL. This and other information is provided in the metadata table.

  16. d

    Health and Retirement Study (HRS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Health and Retirement Study (HRS) [Dataset]. http://doi.org/10.7910/DVN/ELEKOY
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D

  17. Data from: Medication use associated with exposure to manganese in two Ohio...

    • s.cnmilf.com
    • datasets.ai
    • +1more
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Medication use associated with exposure to manganese in two Ohio towns [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/medication-use-associated-with-exposure-to-manganese-in-two-ohio-towns
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Ohio
    Description

    A cross-sectional design was used where 86 residents of East Liverpool, Ohio, 100 residents from Marietta, Ohio and 90 residents from Mount Vernon, Ohio were recruited and participated in the study. The Marietta/Mount Vernon data collection took place in August, 2009 as this was the original study _location. Marietta was an air manganese (air-Mn) exposed community and Mt. Vernon was a comparison community believed to have little or no air-Mn exposure. After receiving additional funding and approvals, East Liverpool was added and data collection occurred in November, 2011 using identical study protocols to the Marietta/Mount Vernon study with the exception of additional specimen collections of hair and toenails (only collected in East Liverpool). All participants underwent a neuropsychological battery of tests of mood, motor and cognitive function. A comprehensive health questionnaire was administered inquiring about sociodemographics, symptoms, diagnosed illnesses, medication use, health habits, work history, and dietary consumption (used to compute dietary intake of Mn and Fe). Additionally, the study included data acquisition on air monitoring and modeling, biomarkers, and health. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Because this data set includes protected health information, public access is not available. Format: csv files. This dataset is associated with the following publication: Bowler, R., S. Adams, C. Wright, Y. Kim, A. Booty, M. Colledge, V. Gocheva, and D. Lobdell. Medication Use Associated with Exposure to Manganese in Two Ohio Towns. INTERNATIONAL JOURNAL OF ENVIRONMENTAL HEALTH RESEARCH. Carfax Publishing Limited, Basingstoke, UK, 26(5): 483-96, (2016).

  18. ds000255_R1.0.0

    • openneuro.org
    Updated Jul 16, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yoichi Miyawaki; Hajime Uchida; Okito Yamashita; Masa-aki Sato; Yusuke Morito; Hiroki C. Tanabe; Norihiro Sadato; Yukiyasu Kamitani (2018). ds000255_R1.0.0 [Dataset]. https://openneuro.org/datasets/ds000255/versions/00001
    Explore at:
    Dataset updated
    Jul 16, 2018
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Yoichi Miyawaki; Hajime Uchida; Okito Yamashita; Masa-aki Sato; Yusuke Morito; Hiroki C. Tanabe; Norihiro Sadato; Yukiyasu Kamitani
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Visual image reconstruction

    Original paper: Miyawaki Y, Uchida H, Yamashita O, Sato M, Morito Y, Tanabe HC, Sadato N & Kamitani Y (2008) Visual Image Reconstruction from Human Brain Activity using a Combination of Multiscale Local Image Decoders. Neuron 60:915-929.

    Overview

    This is the fMRI data from Miyawaki et al. (2008) "Visual image reconstruction from human brain activity using a combination of multiscale local image decoders". Neuron 60:915-29. In this study, we collected fMRI activity from subjects viewing images, and constructed decoders predicting local image contrast at multiple spatial scales. The combined decoders based on a linear model successfully reconstructed presented stimuli from fMRI activity.

    Task

    The experiment consisted of human subjects viewing contrast-based images of 12 x 12 flickering patches. There were two types of image viewing tasks: (1) random image viewing and (2) figure image (geometric shape or alphabet letter) viewing. For image presentation, a block design was used with rest periods between the presentation of each image. For random image patch presentation, images were presented for 6 s, followed by 6 s rest. For figure image presentation, images were presented for 12 s, followed by 12 s rest. The data from random image viewing runs were used to train the decoding models, and the trained model were evaluated with the data from figure image viewing runs.

    Dataset

    This dataset contains two subjects ('sub-01' and 'sub-02'). The subjects performed two sessions of fMRI experiments ('ses-01' and 'ses-02'). Each session is composed of several EPI runs (TR, 2000 ms; TE, 30 ms; flip angle, 80°; voxel size, 3 × 3 × 3 mm; FOV, 192 × 192 mm; number of slices, 30, slice gap, 0 mm) and inplane T2-weighted imaging (TR, 6000 ms; TE, 57 ms; flip angle, 90°; voxel size, 0.75 × 0.75 × 3.0 mm; FOV, 192 × 192 mm). The EPI images covered the entire occipital lobe. The dataset also includes a T1-weighted anatomical reference image for each subject (TR, 2250 ms; TE, 2.98 ms for sub-01 and 3.06 ms for sub-02; TI, 900 ms; flip angle, 9°; voxel size, 1.0 × 1.0 × 1.0 mm; FOV, 256 × 256 mm). The T1w images were obtained in sessions different from the fMRI experiment sessions and stored in 'ses-anat' directories. The T1w images were defaced by pydeface (https://pypi.python.org/pypi/pydeface). All DICOM files are converted to Nifti-1 files by mri_convert in FreeSurfer. In addition, the dataset contains mask images of manually defined ROIs for each subjects in sourcedata directory (See README in sourcedata for more details).

    During fMRI runs, the subject viewed contrast-based images of 12 × 12 flickering image patches. Two types of runs ('viewRandom' and 'viewFigure') were included in the experiment. In 'viewRandom' runs, random images were presented as visual stimuli. Each 'viewRandom' runs consisted of 22 stimulus presentation trials and lasted for 298 s (149 volumes). The two subjects performed 20 'viewRandom' runs. In 'viewFigure' runs, either geometric shape pattern (square, small frame, large frame, plus, X) or alphabet letter pattern (n, e, u, r, o) was presented in each trial. In addition, data while the subject viewed thin and large alphabet letter patterns (n, e, u, r, o) are included in the dataset (they are not included in the results of the original study). Each 'viewFigure' run consisted of 10 stimulus presentation trials and lasted for 268 s (134 volumes). The 'sub-01' and 'sub-02' performed 12 and 10 'viewFigure' runs, respectively.

    To help subjects suppress eye blinks and firmly fixate the eyes, the color of the fixation spot changed from white to red 2 s before each stimulus block started. To ensure alertness, subjects were instructed to detect the color change of the fixation (red to green, 100 ms) that occurred after a random interval of 3–5 s from the beginning of each stimulus block. Performances of the subject was monitored online during experiments, but were not recorded and omitted from the dataset.

    Task event files

    The value of trial_type in the task event files (*_events.tsv) indicates the type of each trial (block) as below.

    • rest: Rest trial (no visual stimulus).
    • stimulus_random: Random pattern.
    • stimulus_shape: Geometric shape pattern (square, small frame, large frame, plus, X).
    • stimulus_alphabet: Alphabet pattern (n, e, u, r, o).
    • stimulus_alphabet_thin: Thin alphabet pattern (n, e, u, r, o).
    • stimulus_alphabet_long: Long alphabet pattern (n, e, u, r, o).

    Note that the results from thin and long alphabet patterns are not included in the original paper although the data were obtained in the same sessions.

    Additional column stimulus_pattern contains the pattern of stimuli (12 × 12) presented in each stimulus trial. It is vectorized in row-major order. Each element in the vector corresponds to a patch (1.15° × 1.15°) in a stimulus pattern. 1 and 0 represnets a flickering checkerboard and a gray area, respectively. For example, stimulus pattern of

    000000000000000000000000000000000000000111111000000111111000000110011000000110011000000110011000000110011000000000000000000000000000000000000000
    

    represents the following stimulus.

    000000000000
    000000000000
    000000000000
    000111111000
    000111111000
    000110011000
    000110011000
    000110011000
    000110011000
    000000000000
    000000000000
    000000000000
    

    The column holds 'null' for rest trials.

    Comments added by Openfmri Curators

    ===========================================

    General Comments

    Defacing

    Pydeface was used on all anatomical images to ensure de-identification of subjects. The code can be found at https://github.com/poldracklab/pydeface

    Quality Control

    MRIQC was run on the dataset. Results are located in derivatives/mriqc. Learn more about it here: https://mriqc.readthedocs.io/en/stable/

    Where to discuss the dataset

    1) www.openfmri.org/dataset/ds******/ See the comments section at the bottom of the dataset page. 2) www.neurostars.org Please tag any discussion topics with the tags openfmri and dsXXXXXX. 3) Send an email to submissions@openfmri.org. Please include the accession number in your email.

    Known Issues

    -behavioral performance data is not accompanied this dataset as submitter didn't submit.

  19. u

    Growth and Yield Data for the Bushland, Texas, Winter Wheat Datasets

    • agdatacommons.nal.usda.gov
    • catalog.data.gov
    xlsx
    Updated Nov 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven R. Evett; Gary W. Marek; Karen S. Copeland; Terry A. Sr. Howell; Paul D. Colaizzi; David K. Brauer; Brice B. Ruthardt (2025). Growth and Yield Data for the Bushland, Texas, Winter Wheat Datasets [Dataset]. http://doi.org/10.15482/USDA.ADC/1527918
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Steven R. Evett; Gary W. Marek; Karen S. Copeland; Terry A. Sr. Howell; Paul D. Colaizzi; David K. Brauer; Brice B. Ruthardt
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Bushland, Texas
    Description

    This dataset consists of growth and yield data for each season when winter wheat (Triticum aestivum L.) was grown for grain at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU) research weather station, Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). In each season, winter wheat was grown for grain on two large, precision weighing lysimeters, each in the center of a 4.44 ha square field. The square fields are themselves arranged in a larger square with the fields in four adjacent quadrants of the larger square. Fields and lysimeters within each field are thus designated northeast (NE), southeast (SE), northwest (NW), and southwest (SW). Irrigation was by linear move sprinkler system. Irrigation protocols described as full were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a neutron probe to 2.4-m depth in the field. Irrigation protocols described as deficit typically involved irrigations to establish the crop early in the season, followed by reduced or absent irrigations later in the season (typically in the later winter and spring). The growth and yield data include plant population density, height (except in 1989-1990), plant row width, leaf area index, growth stage, total above-ground biomass, leaf and stem biomass, head mass (when present), kernel number, and final yield. Data are from replicate samples in the field and non-destructive (except for final harvest) measurements on the weighing lysimeters. In most cases yield data are available from both manual sampling on replicate plots in each field and from machine harvest. These datasets originate from research aimed at determining crop water use (ET), crop coefficients for use in ET-based irrigation scheduling based on a reference ET, crop growth, yield, harvest index, and crop water productivity as affected by irrigation method, timing, amount (full or some degree of deficit), agronomic practices, cultivar, and weather. Prior publications have focused on winter wheat ET, crop coefficients, and crop water productivity. Crop coefficients have been used by ET networks. The data have utility for testing simulation models of crop ET, growth, and yield and have been used by the Agricultural Model Intercomparison and Improvement Project (AgMIP) and by many others for testing, and calibrating models of ET that use satellite and/or weather data. Resources in this dataset:Resource Title: 1989-1990 Bushland, TX, west winter wheat growth and yield data. File Name: 1989-1990_West_Wheat_Growth_and_Yield.xlsxResource Description: This dataset consists of growth and yield data the 1989-1990 winter wheat (Triticum aestivum L.) season at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU) research weather station, Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Winter wheat was grown on two large, precision weighing lysimeters, each in the center of a 4.44 ha square field. The two square fields were themselves arranged with one directly north of and contiguous with the other. Fields and lysimeters within each field were designated northwest (NW), and southwest (SW). Irrigation was by linear move sprinkler system. Irrigations described as full were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a neutron probe to 2.4-m depth in the field. Irrigation described as deficit typically involved irrigation to establish the crop in the autumn followed by reduced or no irrigation later in the late winter or spring. The growth and yield data include plant height (except in 1989-1990), leaf area index, growth stage, total above-ground biomass, leaf and stem biomass, hea biomass, and final yield. Data are from replicate samples in the field and non-destructive (except for final harvest) measurements on the weighing lysimeters. In most cases yield data are available from both manual sampling on replicate plots in each field and from machine harvest. There is a single spreadsheet for the west (NW and SW) lysimeters and fields. The spreadsheets contain tabs for data and corresponding tabs for data dictionaries. Typically, there are separate data tabs and corresponding dictionaries for plant growth during the season, crop growth stage, plant population, manual harvest from replicate plots in each field and from lysimeter surfaces, and machine (combine) harvest, An Introduction tab explains the tab names and contents, lists the authors, explains conventions, and lists some relevant references.Resource Title: 1991-1992 Bushland, TX, east winter wheat growth and yield data. File Name: 1991-1992_East_Wheat_Growth_and_Yield.xlsxResource Description: This dataset consists of growth and yield data the 1991-1992 winter wheat (Triticum aestivum L.) season at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU) research weather station, Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Winter wheat was grown on two large, precision weighing lysimeters, each in the center of a 4.44 ha square field. The two square fields were themselves arranged with one directly north of and contiguous with the other. Fields and lysimeters within each field were designated northeast (NE), and southeast (SE). Irrigation was by linear move sprinkler system. Irrigations described as full were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a neutron probe to 2.4-m depth in the field. Irrigation described as deficit typically involved irrigation to establish the crop in the autumn followed by reduced or no irrigation later in the late winter or spring. The growth and yield data include plant height, leaf area index, growth stage, total above-ground biomass, leaf and stem biomass, hea biomass, and final yield. Data are from replicate samples in the field and non-destructive (except for final harvest) measurements on the weighing lysimeters. In most cases yield data are available from both manual sampling on replicate plots in each field and from machine harvest. There is a single spreadsheet for the east (NE and SE) lysimeters and fields. The spreadsheets contain tabs for data and corresponding tabs for data dictionaries. Typically, there are separate data tabs and corresponding dictionaries for plant growth during the season, crop growth stage, plant population, manual harvest from replicate plots in each field and from lysimeter surfaces, and machine (combine) harvest, An Introduction tab explains the tab names and contents, lists the authors, explains conventions, and lists some relevant references.Resource Title: 1992-1993 Bushland, TX, west winter wheat growth and yield data. File Name: 1992-1993_W_Wheat_Growth_and_Yield.xlsxResource Description: This dataset consists of growth and yield data the 1992-1993 winter wheat (Triticum aestivum L.) season at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU) research weather station, Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Winter wheat was grown on two large, precision weighing lysimeters, each in the center of a 4.44 ha square field. The two square fields were themselves arranged with one directly north of and contiguous with the other. Fields and lysimeters within each field were designated northwest (NW), and southwest (SW). Irrigation was by linear move sprinkler system. Irrigations described as full were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a neutron probe to 2.4-m depth in the field. Irrigation described as deficit typically involved irrigation to establish the crop in the autumn followed by reduced or no irrigation later in the late winter or spring. The growth and yield data include plant height, leaf area index, growth stage, total above-ground biomass, leaf and stem biomass, hea biomass, and final yield. Data are from replicate samples in the field and non-destructive (except for final harvest) measurements on the weighing lysimeters. In most cases yield data are available from both manual sampling on replicate plots in each field and from machine harvest. There is a single spreadsheet for the west (NW and SW) lysimeters and fields. The spreadsheets contain tabs for data and corresponding tabs for data dictionaries. Typically, there are separate data tabs and corresponding dictionaries for plant growth during the season, crop growth stage, plant population, manual harvest from replicate plots in each field and from lysimeter surfaces, and machine (combine) harvest, An Introduction tab explains the tab names and contents, lists the authors, explains conventions, and lists some relevant references.

  20. PFAS and multimorbidity among a random sample of patients from the...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). PFAS and multimorbidity among a random sample of patients from the University of North Carolina Healthcare System [Dataset]. https://catalog.data.gov/dataset/pfas-and-multimorbidity-among-a-random-sample-of-patients-from-the-university-of-north-car
    Explore at:
    Dataset updated
    Oct 28, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    This dataset contains electronic health records used to study associations between PFAS occurrence and multimorbidity in a random sample of UNC Healthcare system patients. The dataset contains the medical record number to uniquely identify each individual as well as information on PFAS occurrence at the zip code level, the zip code of residence for each individual, chronic disease diagnoses, patient demographics, and neighborhood socioeconomic information from the 2010 US Census. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Because this data has PII from electronic health records the data can only be accessed with an approved IRB application. Project analytic code is available at L:/PRIV/EPHD_CRB/Cavin/CARES/Project Analytic Code/Cavin Ward/PFAS Chronic Disease and Multimorbidity. Format: This data is formatted as a R dataframe and associated comma-delimited flat text file. The data has the medical record number to uniquely identify each individual (which also serves as the primary key for the dataset), as well as information on the occurrence of PFAS contamination at the zip code level, socioeconomic data at the census tract level from the 2010 US Census, demographics, and the presence of chronic disease as well as multimorbidity (the presence of two or more chronic diseases). This dataset is associated with the following publication: Ward-Caviness, C., J. Moyer, A. Weaver, R. Devlin, and D. Diazsanchez. Associations between PFAS occurrence and multimorbidity as observed in an electronic health record cohort. Environmental Epidemiology. Wolters Kluwer, Alphen aan den Rijn, NETHERLANDS, 6(4): p e217, (2022).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Patrick VINCOURT (2025). Merger of BNV-D data (2008 to 2019) and enrichment [Dataset]. https://data.europa.eu/data/datasets/5f1c3eca9d149439e50c740f?locale=en

Merger of BNV-D data (2008 to 2019) and enrichment

Explore at:
zip(18530465)Available download formats
Dataset updated
Jan 16, 2025
Dataset authored and provided by
Patrick VINCOURT
Description

Merging (in Table R) data published on https://www.data.gouv.fr/fr/datasets/ventes-de-pesticides-par-departement/, and joining two other sources of information associated with MAs: — uses: https://www.data.gouv.fr/fr/datasets/usages-des-produits-phytosanitaires/ — information on the “Biocontrol” status of the product, from document DGAL/SDQSPV/2020-784 published on 18/12/2020 at https://agriculture.gouv.fr/quest-ce-que-le-biocontrole

All the initial files (.csv transformed into.txt), the R code used to merge data and different output files are collected in a zip. enter image description here NB: 1) “YASCUB” for {year,AMM,Substance_active,Classification,Usage,Statut_“BioConttrol”}, substances not on the DGAL/SDQSPV list being coded NA. 2) The file of biocontrol products shall be cleaned from the duplicates generated by the marketing authorisations leading to several trade names.
3) The BNVD_BioC_DY3 table and the output file BNVD_BioC_DY3.txt contain the fields {Code_Region,Region,Dept,Code_Dept,Anne,Usage,Classification,Type_BioC,Quantite_substance)}

Search
Clear search
Close search
Google apps
Main menu