Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of protein-drug molecular docking
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The performance of different models on the common independent proteins.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Demographics of patients (SGH cohort).
Membrane proteins play critical roles between the tumor cells and the extra-cellular matrix (ECM) during metastasis. In this study, we performed quantitative proteomic analysis of membrane proteins from two human giant-cell lung carcinoma cell strains, low- (95C) and high-(95D) metastatistic cell lines, and combining with microRNA analysis, we identified a multi-omics regulation module.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistic for the 62 optimal features selected by random forest.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RVD = rheumatic valvular disease; DVD = degenerative valvular disease; MW = protein molecular weight; pI = isoelectric point.*Candidate proteins for validation;#UniProt Knowledgebase, http://expasy.org/uniprot.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Identified proteins by LC-MS/MS or MALDI-TOF-MS/MS. (DOCX)
STR has been based at the Karolinska Institutet since 1959, first at the Institution of Hygien and thereafter at Medical Epidemiology and Biostatistics, MEB. STR was originally created primarily to study the importance of environmental factors for the development of cardiovascular/respiratory diseases and cancer, but has since then evolved to a resource for all epidemiological and genetic aspects of ill health. The research that is based on STR is financed externally through grants that the users apply for individually. STR is in this way making up the basis for a lot of research; during the past decade over 50 articles have been published annually, where of several in high impact journals. STR has during the past decade transformed from being primarily an epidemiological resource to forming a biobank of samples (DNA, blood and serum) for a large number of twins. Genome-wide genotyping of close to 30 000 participants have been undertaken and the plan is that all DNA samples shall become genotyped on a genome-wide platform the coming few years. Serum from 12 600 twins have so far been used for measurements of classical blood biomarkers. Generated genotypes and biomarker measurements builds in an effective manner up the value of STR as an molecular epidemiological resource.
Purpose:
The goal of the Swedish Twin Registry (STR) is to provide a longitudinal research infrastructure in the form of a population-based twin cohort of adequate size and content to enable powerful epidemiological and molecular medical studies. The study designs used are classical epidemiological investigations of risk-factors for disease and death (providing within twin pair designs), genetic association studies, heritability studies (both twin model based and molecular based), epigenetics, proteomics as well as other types of "-omics" approaches. STR is open for Swedish researchers and international researchers that have a Swedish collaborator.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The performance of the RF model by 10-fold cross validation test.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The performance of the RF models based on the six residue sets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
aCategory 1: genes are both highly ranked and involved in neurodevelopment. Category 2: genes are exclusive to neurodevelopment. Category 3: genes are exclusively highly ranked (see details in text).bWe performed simulations by four methods: 1) based on the count of SNPs, 2) based on the minimum p-value, 3) based on the number of SNPs with p
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of the Adequacy Index and Log Likelihood ratios of diagnostic models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All variables displayed as mean ± SEM (n = 40 in each group);RVD = rheumatic valvular disease;DVD = degenerative valvular disease; M = male; F = female; GLU = glucose; TBIL = total bilirubin;GPT = glutamic-pyruvic transaminase; TCHOL = total cholesterol; TG = triglyceride; HDL-C = high-density lipoprotein cholesterol; LDL-C = low-density lipoprotein cholesterol; UA = uric acid; CREA = creatinine.P>0.05 in all comparisons between any two of these groups (unpaired t-test for all continuous numbers).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The performance of the RF models based on the six feature groups.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Proteins identified with LC-MS/MS, MALDI-TOF-MS/MS. ID is the same as in figure 2 (Weight-plot PLS-DA).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison results of our method, Effective T3, BPBAac and BEAN.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroudType III secretion systems (T3SSs) are central to the pathogenesis and specifically deliver their secreted substrates (type III secreted proteins, T3SPs) into host cells. Since T3SPs play a crucial role in pathogen-host interactions, identifying them is crucial to our understanding of the pathogenic mechanisms of T3SSs. This study reports a novel and effective method for identifying the distinctive residues which are conserved different from other SPs for T3SPs prediction. Moreover, the importance of several sequence features was evaluated and further, a promising prediction model was constructed.ResultsBased on the conservation profiles constructed by a position-specific scoring matrix (PSSM), 52 distinctive residues were identified. To our knowledge, this is the first attempt to identify the distinct residues of T3SPs. Of the 52 distinct residues, the first 30 amino acid residues are all included, which is consistent with previous studies reporting that the secretion signal generally occurs within the first 30 residue positions. However, the remaining 22 positions span residues 30–100 were also proven by our method to contain important signal information for T3SP secretion because the translocation of many effectors also depends on the chaperone-binding residues that follow the secretion signal. For further feature optimisation and compression, permutation importance analysis was conducted to select 62 optimal sequence features. A prediction model across 16 species was developed using random forest to classify T3SPs and non-T3 SPs, with high receiver operating curve of 0.93 in the 10-fold cross validation and an accuracy of 94.29% for the test set. Moreover, when performing on a common independent dataset, the results demonstrate that our method outperforms all the others published to date. Finally, the novel, experimentally confirmed T3 effectors were used to further demonstrate the model’s correct application. The model and all data used in this paper are freely available at http://cic.scu.edu.cn/bioinformatics/T3SPs.zip.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The conservation differences analyzed by SAM for N-terminal 100 residues.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Integrating evidence from multiple domains is useful in prioritizing disease candidate genes for subsequent testing. We ranked all known human genes (n = 3819) under linkage peaks in the Irish Study of High-Density Schizophrenia Families using three different evidence domains: 1) a meta-analysis of microarray gene expression results using the Stanley Brain collection, 2) a schizophrenia protein-protein interaction network, and 3) a systematic literature search. Each gene was assigned a domain-specific p-value and ranked after evaluating the evidence within each domain. For comparison to this ranking process, a large-scale candidate gene hypothesis was also tested by including genes with Gene Ontology terms related to neurodevelopment. Subsequently, genotypes of 3725 SNPs in 167 genes from a custom Illumina iSelect array were used to evaluate the top ranked vs. hypothesis selected genes. Seventy-three genes were both highly ranked and involved in neurodevelopment (category 1) while 42 and 52 genes were exclusive to neurodevelopment (category 2) or highly ranked (category 3), respectively. The most significant associations were observed in genes PRKG1, PRKCE, and CNTN4 but no individual SNPs were significant after correction for multiple testing. Comparison of the approaches showed an excess of significant tests using the hypothesis-driven neurodevelopment category. Random selection of similar sized genes from two independent genome-wide association studies (GWAS) of schizophrenia showed the excess was unlikely by chance. In a further meta-analysis of three GWAS datasets, four candidate SNPs reached nominal significance. Although gene ranking using integrated sources of prior information did not enrich for significant results in the current experiment, gene selection using an a priori hypothesis (neurodevelopment) was superior to random selection. As such, further development of gene ranking strategies using more carefully selected sources of information is warranted.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Data