The Neo4J graph database containing annotation related to the M. tuberculosis H37Rv genome, created as part of the COMBAT TB project at the South African National Bioinformatics Institute (SANBI).
************Tuberculosis (TB) Chest X-ray Database************ A team of researchers from Qatar University, Doha, Qatar, and the University of Dhaka, Bangladesh along with their collaborators from Malaysia in collaboration with medical doctors from Hamad Medical Corporation and Bangladesh have created a database of chest X-ray images for Tuberculosis (TB) positive cases along with Normal images. In our current release, there are 700 TB images publicly accessible and 2800 TB images can be downloaded from NIAID TB portal[3] by signing an agreement, and 3500 normal images.
Note: -The research team managed to classify TB and Normal Chest X-ray images with an accuracy of 98.3%. This scholarly work is published in IEEE Access. Please make sure you give credit to us while using the dataset, code, and trained models.
Credit should go to the following: Tawsifur Rahman, Amith Khandakar, Muhammad A. Kadir, Khandaker R. Islam, Khandaker F. Islam, Zaid B. Mahbub, Mohamed Arselene Ayari, Muhammad E. H. Chowdhury. (2020) "Reliable Tuberculosis Detection using Chest X-ray with Deep Learning, Segmentation and Visualization". IEEE Access, Vol. 8, pp 191586 - 191601. DOI. 10.1109/ACCESS.2020.3031384. Paper Link
To view images please check image folders and references of each image are provided in the metadata.csv.
Research Team members and their affiliation Muhammad E. H. Chowdhury, PhD (mchowdhury@qu.edu.qa) Department of Electrical Engineering, Qatar University, Doha-2713, Qatar Tawsifur Rahman (tawsifurrahman.1426@gmail.com) Department of Electrical Engineering, Qatar University, Doha-2713, Qatar Amith Khandakar (amitk@qu.edu.qa) Department of Electrical Engineering, Qatar University, Doha-2713, Qatar Rashid Mazhar, MD Thoracic Surgery, Hamad General Hospital, Doha-3050, Qatar Muhammad Abdul Kadir, PhD Department of Biomedical Physics & Technology, University of Dhaka, Dhaka-1000, Bangladesh Zaid Bin Mahbub, PhD Department of Mathematics and Physics, North South University, Dhaka-1229, Bangladesh Khandakar R. Islam, MD Department of Orthodontics, Bangabandhu Sheikh Mujib Medical University, Dhaka-1000, Bangladesh
Contribution - This dataset contains CXR images of Normal (3500) and patients with TB (700 TB images in publicly accessible and 2800 TB images can be downloaded from NIAID TB portal[3] by signing an agreement). The TB database is collected from the source: 1. NLM dataset: National Library of Medicine (NLM) in the U.S. [1] has made two lung X-ray datasets publicly available: the Montgomery and Shenzhen datasets. 2. Belarus dataset: Belarus Set [2] was collected for a drug resistance study initiated by the National Institute of Allergy and Infectious Diseases, Ministry of Health, Republic of Belarus. 3. NIAID TB dataset: NIAID TB portal program dataset [3], which contains about 3000 TB positive CXR images from about 3087 cases. -Note: Due to the data-sharing restriction, we have to direct the potential user to NIAID website where you can get a data-sharing agreement signing option and you can get DICOM images from there easily. Weblink: https://tbportals.niaid.nih.gov/download-data 4. RSNA CXR dataset: RSNA pneumonia detection challenge dataset [4], which is comprised of about 30,000 chest X-ray images, where 10,000 images are normal and others are abnormal and lung opacity images.
This database has been used in the paper titled “Reliable Tuberculosis Detection using Chest X-ray with Deep Learning, Segmentation and Visualization” published in IEEE Access in 2020.
Objective - Researchers can use this database to produce useful and impactful scholarly work on TB, which can help in tackling this issue.
Citation - Please cite this database if you are using it for any scientific purpose: Tawsifur Rahman, Amith Khandakar, Muhammad A. Kadir, Khandaker R. Islam, Khandaker F. Islam, Zaid B. Mahbub, Mohamed Arselene Ayari, Muhammad E. H. Chowdhury. (2020) "Reliable Tuberculosis Detection using Chest X-ray with Deep Learning, Segmentation and Visualization". IEEE Access, Vol. 8, pp 191586 - 191601. DOI. 10.1109/ACCESS.2020.3031384.
References: [1] S. Jaeger, S. Candemir, S. Antani, Y.-X. J. Wáng, P.-X. Lu, and G. Thoma, "Two public chest X-ray datasets for computer-aided screening of pulmonary diseases," Quantitative imaging in medicine and surgery, vol. 4 (6), p. 475(2014) [2] B. P. Health. (2020). BELARUS TUBERCULOSIS PORTAL [Online]. Available: http://tuberculosis.by/. [Accessed on 09-June-2020] [3] NIAID TB portal program dataset [Online]. Available: https://tbportals.niaid.nih.gov/download-data. [4] kaggle. RSNA Pneumonia Detection Challenge [Online]. Available: https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data. [Accessed on 09-June-2020]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gene/SNP associated with resistance.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This is a Neo4j format M. tuberculosis reference annotation database. To view it you need a Neo4j (v 2.3) instance. There is a script (run_db.sh) that will start a Neo4j instance for you based on this data, using docker. Run that (bash run_db.sh) and connect to http://localhost:7474.
The database was created by the COMBAT TB project (http://christoffels.sanbi.ac.za/index.php/projects/combat-tb) at the South African National Bioinformatics Institute (SANBI).
Authors: Thoba Lose, Peter van Heusden, Ziphozakhe Mashologu, Alan Christoffels .
The COMBAT TB project is funded by the South African Medical Research Council (MRC) and was supported by the South African Research Chairs Initiative of the Department of Science and Technology and National Research Foundation of South Africa.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is the supporting data file for the manuscript entitled "Higher rate of tuberculosis in second generation migrants compared to native residents in a metropolitan setting in Western Europe" (Marx et al., PLoS ONE). The dataset includes anonymized, routinely collected notification data (variables labeled as "nd") for 314 individuals and anonymized survey data (i.e. data obtained through interviews; variables labeled as "sd") for a subset of 154 individuals. The data are published open-access, in accordance with the PLoS ONE data policy (2014).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and codes for the publication
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The completed dataset for the World Bank's financing of TB control initiatives from 1986-2017 (coded_questionnaire.csv). This is accompanied by the variable description guide (Glossary and Variable Description Guide.pdf).
Co-financiers_database.csv is the data used in the analysis of co-financing of World Bank tuberculosis projects.
IHME_DAH_DATABASE 24-10-17.csv is the data used in the analysis of Development Assitance for Health and Tuberculosis.
World Bank and TB.R is the source code used to encode and analyse all of the above data. Data analysis on R version 3.4.3.
Extended Data.pdf is the supplementary data for publication.
Mycobacterium tuberculosis genetic mutation data obtained via whole genome sequencing using Illumina. Illumina raw reads were processed with Stampy and Platypus based pipeline for variant calling. Mutations found are listed, assuming >40% purity and meeting Platypus Pass criteria. Mutations not found not listed. All isolates met quality criteria to confirm coverage of regions of interest. Also included is a separate file with the same ids linking the genotypic data above with in vitro culture based drug susceptibility testing for 10 drugs. Further description of the data and methods is available here: https://doi.org/10.1101/275628 (2019)
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The database is built as a Mycobacterium-specific one. For this database, we used genome_updater.sh (v0.6.3; https://github.com/pirovc/genome_updater)
genome_updater.sh -d "refseq" -g "bacteria" -T "g_Mycobacterium" -f "genomic.fna.gz" -M "gtdb" -A 1 -m
In addition, we added 17 high-quality M. tuberculosis genomes to this collection (from https://doi.org/10.1186/s13059-021-02474-0 - see mtb_gramtools_lineages.csv for the list of accessions).
The mtb.ids file lists the sequence identifiers for those sequences in the database which are M. tuberculosis.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
For clinical assay validations, well-characterized samples are essential for assessing methodology sensitivity and specificity. To support the community in the development of clinical next-generation sequencing assays for Mycobacterium tuberculosis, we released a comprehensive dataset of 50 whole genome sequences from characterized strains, complete with drug susceptibility and mutation profiles.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Multidrug-resistant Tuberculosis (MDR-TB) is a type of Tuberculosis (TB) that is resistant to at least one or more of the main anti-TB drugs, namely Rifampin or Isoniazid, so this infection is more difficult to eliminate. Good knowledge and behavior of caregivers and patients can affect the success of treatment because they tend not to be late in taking treatment. In this data note we provide the details of a research database of 228 MDR-TB caregivers and patients underwent treatment from January 2020 to December 2021 in preferred hospital in West Java, Indonesia. The purposes of this publication are to describe the dataset for external researchers who may be interested in making use of it, and to detail the methods used to obtain the dataset to determine the level of knowledge and behavior of MDR-TB caregivers and patients regarding the disease through a validated questionnaire consisted of the knowledge and behavior distributed to respondents via online and offline.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Mycobacterium tuberculosis genetic mutation data obtained via either targeted sequencing or whole genome sequencing using Illumina. Illumina raw reads were processed with Stampy and Platypus based pipeline for variant calling. Targeted sequence data (and related phenotype data see below) was previously published within Dataverse under: https://doi.org/10.7910/DVN/AQ5LH5 and https://doi.org/10.7910/DVN/GYHIB2 Mutations found have a status of 1, assuming >40% purity and meeting Platypus Pass criteria. Mutations not found not listed. Mutations missing have a status of NA. Also included is a separate file with the same ids linking the genotypic data above with in vitro culture based drug susceptibility testing for 10 drugs. Further description of the data and methods is available here: https://doi.org/10.1101/275628
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data set on childhood tuberculosis
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background: Although Mycobacterium tuberculosis isolates are consisted of several different lineages and the epidemiology analyses are usually assessed relative to a particular reference genome, M. tuberculosis H37Rv, which might introduce some biased results. Those analyses are essentially based genome sequence information of M. tuberculosis and could be performed in sillico in theory, with whole genome sequence (WGS) data available in the databases and obtained by next generation sequencers (NGSs). As an approach to establish higher resolution methods for such analyses, whole genome sequences of the M. tuberculosis complexes (MTBCs) strains available on databases were aligned to construct virtual reference genome sequences called the consensus sequence (CS), and evaluated its feasibility in in sillico epidemiological analyses. Results: The consensus sequence (CS) was successfully constructed and utilized to perform phylogenetic analysis, evaluation of read mapping efficacy, which is crucial for detecting single nucleotide polymorphisms (SNPs), and various MTBC typing methods virtually including spoligotyping, VNTR, Long sequence polymorphism and Beijing typing. SNPs detected based on CS, in comparison with H37Rv, were utilized in concatemer-based phylogenetic analysis to determine their reliability relative to a phylogenetic tree based on whole genome alignment as the gold standard. Statistical comparison of phylogenic trees based on CS with that of H37Rv indicated the former showed always better results that that of later. SNP detection and concatenation with CS was advantageous because the frequency of crucial SNPs distinguishing among strain lineages was higher than those of H37Rv. The number of SNPs detected was lower with the consensus than with the H37Rv sequence, resulting in a significant reduction in computational time. Performance of each virtual typing was satisfactory and accorded with those published when those are available. Conclusions: These results indicated that virtual CS constructed from genome sequence data is an ideal approach as a reference for MTBC studies.
Paper2_PlosOne_Dyrad
Tuberculosis is a disease that affects many people in developing countries. While treatment is possible, it requires an accurate diagnosis first. In these countries projects there are in many cases available X-ray machines (through low-cost projects and donations), but often the radiological expertise is missing for accurately assessing the images. An algorithm that could perform this task quickly and cheaply could drastically improve the ability to diagnose and ultimately treat the disease.
In more developed countries, X-ray radiography is often used for screening new arrivals and determining eligibility for a work-permit. The task of manually examining images is time consuming and an algorithm could increase efficiency, improve performance and ultimately reduce cost of this screening.
This dataset contains over 500 x-rays scans with clinical labels collected by radiologists.
The two datasets were published together in an analysis here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4256233/. The datasets come from Shenzhen and Montgomery respectively.
The standard digital image database for Tuberculosis is created by the National Library of Medicine, Maryland, USA in collaboration with Shenzhen No.3 People’s Hospital, Guangdong Medical College, Shenzhen, China. The Chest X-rays are from out-patient clinics, and were captured as part of the daily routine using Philips DR Digital Diagnose systems. Number of X-rays:
It is requested that publications resulting from the use of this data attribute the source (National Library of Medicine, National Institutes of Health, Bethesda, MD, USA and Shenzhen No.3 People’s Hospital, Guangdong Medical College, Shenzhen, China) and cite the following publications:
X-ray images in this data set have been acquired from the tuberculosis control program of the Department of Health and Human Services of Montgomery County, MD, USA. This set contains 138 posterior-anterior x-rays, of which 80 x-rays are normal and 58 x-rays are abnormal with manifestations of tuberculosis. All images are de-identified and available in DICOM format. The set covers a wide range of abnormalities, including effusions and miliary patterns. The data set includes radiology readings available as a text file.
Mycobacterium tuberculosis (Mtb) antigen-specific cellular response is promising for detectionof Mtb infection, but not efficient for diagnosis of TB. We firstly identified 16 TB disease-specific protein markers measured in the culture supernatant of Mtb-stimulated whole blood using a 640 human proteins array, the highest throughput antibody-based protein array available at the time when we did this study. Potential TB-related proteins were then analyzed across three different patient cohorts comprised of healthy controls, LTBI, non-TB pneumonia, and TB patients to evaluate how the biomarkers performed in diagnosing TB in the real clinical setting. The data finally reveal an eight-protein biosignature of TB. We prospectively enrolled three cohorts into for our study, including 160 subjects to screen protein biomarkers of tuberculosis, 368 subjects to establish and test the predictive model, and 102 subjects for biomarker validation. Whole blood cultures were stimulated with pooled Mtb-peptides or mitogen, and then 640 proteins within the culture supernatant were analyzed simultaneously using an antibody-based array. 16 candidate biomarkers of tuberculosis were identified and developed into a custom multiplexed antibody array for biomarker validation.
Granulomas are the pathological hallmark of tuberculosis (TB). However, their function and mechanisms of formation remain poorly understood. To understand the role of granulomas in TB, we analyzed the proteomes of granulomas from TB patients in an unbiased fashion. Using laser capture microdissection and mass spectrometry , we generated detailed molecular maps of human granulomas. We found that the centers of granulomas possess a pro-inflammatory environment characterized by anti-microbial peptides, ROS and pro-inflammatory eicosanoids. Conversely, the tissue surrounding the caseum possesses a comparatively anti-inflammatory signature.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Single crystal X-ray diffraction data for MtBioA related to PDBID: xxxx
Data collected at Diamond Light Source, UK
Beamline I03
X-rays, CT Images and Genomic Sequences representing cases of tuberculosis.