86 datasets found
  1. Lung-Cancer-Risk-Dataset

    • kaggle.com
    Updated Aug 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikey-TraceGod (2025). Lung-Cancer-Risk-Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/12844025
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mikey-TraceGod
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Lung Cancer Risk Dataset

    Overview

    This dataset contains 50,000 patient profiles designed for lung cancer risk analysis and machine learning applications. The dataset is clean, preprocessed, and ready for immediate use in classification tasks, statistical analysis, and data visualization.

    • Rows: 50,000
    • Columns: 11
    • File: preprocessed_lung_cancer_dataset.csv
    • License: CC0: Public Domain

    Dataset Description

    The dataset includes patient profiles with features based on established lung cancer risk factors such as smoking history, environmental exposures, and chronic lung conditions. All data is synthetic and designed to reflect realistic risk factor distributions while maintaining patient privacy.

    Features

    ColumnTypeDescriptionValues/Range
    patient_idIntegerUnique patient identifier100000-149999
    ageIntegerPatient age in years18-100
    genderStringPatient gender'Male', 'Female'
    pack_yearsFloatSmoking exposure (years × packs per day)0-100
    radon_exposureStringResidential radon exposure level'Low', 'Medium', 'High'
    asbestos_exposureStringOccupational asbestos exposure history'Yes', 'No'
    secondhand_smoke_exposureStringPassive smoking exposure'Yes', 'No'
    copd_diagnosisStringChronic obstructive pulmonary disease diagnosis'Yes', 'No'
    alcohol_consumptionStringAlcohol consumption pattern'None', 'Moderate', 'Heavy'
    family_historyStringFamily history of lung cancer'Yes', 'No'
    lung_cancerStringTarget variable: Lung cancer diagnosis'Yes', 'No'

    Data Quality

    • Complete: No missing values or duplicates
    • Clean: All values within realistic ranges
    • Balanced Features: Realistic distribution of risk factors
    • Target Distribution: Approximately 25% positive cases, reflecting real-world lung cancer prevalence

    Use Cases

    • Binary classification modeling
    • Risk factor correlation analysis
    • Data visualization and exploratory analysis
    • Machine learning pipeline development
    • Statistical hypothesis testing
  2. The associations of sitting time and physical activity on total and...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vegar Rangul; Erik R. Sund; Paul Jarle Mork; Oluf Dimitri Røe; Adrian Bauman (2023). The associations of sitting time and physical activity on total and site-specific cancer incidence: Results from the HUNT study, Norway [Dataset]. http://doi.org/10.1371/journal.pone.0206015
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Vegar Rangul; Erik R. Sund; Paul Jarle Mork; Oluf Dimitri Røe; Adrian Bauman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Norway
    Description

    BackgroundSedentary behavior is thought to pose different risks to those attributable to physical inactivity. However, few studies have examined the association between physical activity and sitting time with cancer incidence within the same population.MethodsWe followed 38,154 healthy Norwegian adults in the Nord-Trøndelag Health Study (HUNT) for cancer incidence from 1995–97 to 2014. Cox proportional hazards regression was used to estimate risk of site-specific and total cancer incidence by baseline sitting time and physical activity.ResultsDuring the 16-years follow-up, 4,196 (11%) persons were diagnosed with cancer. We found no evidence that people who had prolonged sitting per day or had low levels of physical activity had an increased risk of total cancer incidence, compared to those who had low sitting time and were physically active. In the multivariate model, sitting ≥8 h/day was associated with 22% (95% CI, 1.05–1.42) higher risk of prostate cancer compared to sitting 16.6 MET-h/week). The joint effects of physical activity and sitting time the indicated that prolonged sitting time increased the risk of CRC independent of physical activity in men.ConclusionsOur findings suggest that prolonged sitting and low physical activity are positively associated with colorectal-, prostate- and lung cancer among men. Sitting time and physical activity were not associated with cancer incidence among women. The findings emphasizing the importance of reducing sitting time and increasing physical activity.

  3. Incidence of lung cancer in Europe in 2022, by country and gender

    • statista.com
    Updated Sep 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Incidence of lung cancer in Europe in 2022, by country and gender [Dataset]. https://www.statista.com/statistics/1418818/incidence-of-lung-cancer-in-europe/
    Explore at:
    Dataset updated
    Sep 16, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2022
    Area covered
    Europe, EU
    Description

    In 2022, the incidence of lung cancer among men in Europe was highest in Hungary at ***** per 100,000, while Sweden had the lowest incidence. The incidence of lung cancer recorded among women in Denmark was over ** per 100,000 population. Across the European Union overall, the rate of lung cancer diagnoses was **** per 100,000 among men and **** per 100,000 among women. Smoking and lung cancer risk The connection between smoking and the increased risk of health problems is well established. As of 2021, Hungary had one of the highest daily smoking rates in Europe, with over a quarter of adults smoking daily in the Central European country. The only other countries with a higher share of smoking adults were Bulgaria and Turkey. A positive development though, is the share of adults smoking every day has decreased in almost every European country since 2011. The rise of vaping Originally marketed as a device to help smokers quit, e-cigarettes or vapes have seen increased popularity among people who never smoked cigarettes, especially young people. The use of vapes among young people was reported to be highest in Estonia, Czechia, and Ireland. The dangers of vaping have not been examined over the long term. In the EU there have been attempts to make ‘vapes’ less accessible and appealing for young people, which would include such things as banning flavors and stopping the sale of disposable e-cigarettes.

  4. f

    Data from: Identification of cancer chemotherapy regimens and patient...

    • datasetcatalog.nlm.nih.gov
    • tandf.figshare.com
    Updated Mar 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mendelsohn, Aaron B.; Lockhart, Catherine M.; McDermott, Cara L.; DeFor, Terese A.; Pawloski, Pamala A.; Jamal-Allial, Aziza; Benitez, Gabriela Vazquez; Marshall, James; Yee, Gary; Djibo, Djeneba Audrey; Li, Minghui Sam; McBride, Ali (2023). Identification of cancer chemotherapy regimens and patient cohorts in administrative claims: challenges, opportunities, and a proposed algorithm [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000990575
    Explore at:
    Dataset updated
    Mar 8, 2023
    Authors
    Mendelsohn, Aaron B.; Lockhart, Catherine M.; McDermott, Cara L.; DeFor, Terese A.; Pawloski, Pamala A.; Jamal-Allial, Aziza; Benitez, Gabriela Vazquez; Marshall, James; Yee, Gary; Djibo, Djeneba Audrey; Li, Minghui Sam; McBride, Ali
    Description

    Real-world evidence is a valuable source of information in healthcare. This study describes the challenges and successes during algorithm development to identify cancer cohorts and multi-agent chemotherapy regimens from claims data to perform a comparative effectiveness analysis of granulocyte colony stimulating factor (G-CSF) use. Using the Biologics and Biosimilars Collective Intelligence Consortium’s Distributed Research Network, we iteratively developed and tested a de novo algorithm to accurately identify patients by cancer diagnosis, then extract chemotherapy and G-CSF administrations for a retrospective study of prophylactic G-CSF. After identifying patients with cancer and subsequent chemotherapy exposures, we observed only 12% of patients with cancer received chemotherapy, which is fewer than expected based on prior analyses. Therefore, we reversed the initial inclusion criteria to identify chemotherapy receipt, then prior cancer diagnosis, which increased the number of patients from 2,814 to 3,645, or 68% of patients receiving chemotherapy had diagnoses of interest. Additionally, we excluded patients with cancer diagnoses that differed from those of interest in the 183 days before the index date of G-CSF receipt, including early-stage cancers without G-CSF or chemotherapy exposure. By removing this criterion, we retained 77 patients who were previously excluded. Finally, we incorporated a 5-day window to identify all chemotherapy drugs administered (excluding oral prednisone and methotrexate, as these medications may be used for other non-malignant conditions) as patients may fill oral prescriptions days to weeks prior to infusion. This increased the number of patients with chemotherapy exposures of interest to 6,010. The final cohort of included patients, based on G-CSF exposure, increased from 420 from the initial algorithm to 886 using the final algorithm. Medications used for multiple indications, sensitivity and specificity of administrative codes, and relative timing of medication exposure must all be evaluated to identify patient cohorts receiving chemotherapy from claims data.

  5. Cancer Mortality in People Treated with Antidepressants before Cancer...

    • plos.figshare.com
    ai
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuelian Sun; Peter Vedsted; Morten Fenger-Grøn; Chun Sen Wu; Bodil Hammer Bech; Jørn Olsen; Michael Eriksen Benros; Mogens Vestergaard (2023). Cancer Mortality in People Treated with Antidepressants before Cancer Diagnosis: A Population Based Cohort Study [Dataset]. http://doi.org/10.1371/journal.pone.0138134
    Explore at:
    aiAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yuelian Sun; Peter Vedsted; Morten Fenger-Grøn; Chun Sen Wu; Bodil Hammer Bech; Jørn Olsen; Michael Eriksen Benros; Mogens Vestergaard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundDepression is common after a cancer diagnosis and is associated with an increased mortality, but it is unclear whether depression occurring before the cancer diagnosis affects cancer mortality. We aimed to study cancer mortality of people treated with antidepressants before cancer diagnosis.Methods and FindingsWe conducted a population based cohort study of all adults diagnosed with cancer between January 2003 and December 2010 in Denmark (N = 201,662). We obtained information on cancer from the Danish Cancer Registry, on the day of death from the Danish Civil Registry, and on redeemed antidepressants from the Danish National Prescription Registry. Current users of antidepressants were defined as those who redeemed the latest prescription of antidepressant 0–4 months before cancer diagnosis (irrespective of earlier prescriptions), and former users as those who redeemed the latest prescription five or more months before cancer diagnosis. We estimated an all-cause one-year mortality rate ratio (MRR) and a conditional five-year MRR for patients who survived the first year after cancer diagnosis and confidence interval (CI) using a Cox proportional hazards regression model. Overall, 33,111 (16.4%) patients redeemed at least one antidepressant prescription in the three years before cancer diagnosis of whom 21,851 (10.8%) were current users at the time of cancer diagnosis. Current antidepressant users had a 32% higher one-year mortality (MRR = 1.32, 95% CI: 1.29–1.35) and a 22% higher conditional five-year mortality (MRR = 1.22, 95% CI: 1.17–1.26) if patients survived the first year after the cancer diagnosis than patients not redeeming antidepressants. The one-year mortality was particularly high for patients who initiated antidepressant treatment within four months before cancer diagnosis (MRR = 1.54, 95% CI: 1.47–1.61). Former users had no increased cancer mortality.ConclusionsInitiation of antidepressive treatment prior to cancer diagnosis is common and is associated with an increased mortality.

  6. D

    Data from: Data belonging to 'Smoking intensity and bladder cancer...

    • lifesciences.datastations.nl
    tsv, zip
    Updated Feb 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    L.A.L.M. Kiemeney; L.A.L.M. Kiemeney (2018). Data belonging to 'Smoking intensity and bladder cancer aggressiveness at diagnosis' [Dataset]. http://doi.org/10.17026/DANS-2A6-ATE2
    Explore at:
    zip(22047), tsv(80480), tsv(82205)Available download formats
    Dataset updated
    Feb 12, 2018
    Dataset provided by
    DANS Data Station Life Sciences
    Authors
    L.A.L.M. Kiemeney; L.A.L.M. Kiemeney
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set is part of the Nijmegen Bladder Cancer Study, one of the largest series of bladder cancer in the world (see https://icbc.cancer.gov/). The data were used to investigate the relationship between smoking and bladder cancer aggressiveness at diagnosis. The results will be published as Barbosa A.L.A. et al., Smoking intensity and bladder cancer aggressiveness at diagnosis. Plos One (submitted).The Nijmegen Bladder Cancer Study (NBCS) has been described in more detail in (http://www.ncbi.nlm.nih.gov/pubmed/25023787). Briefly, BC patients diagnosed between 1995-2011 under the age of 75 years in the mid-eastern part of the Netherlands were identified through the Netherlands Cancer Registry (NCR) held by the Netherlands Comprehensive Cancer Organization (IKNL) and contacted via their treating physicians. Patients who consented to participate in the study were asked to fill out a lifestyle questionnaire, including questions on education, occupation, medical history, physical activity, and complete history of smoking. Furthermore, blood samples were collected by Thrombosis Service centers, which hold offices in all the communities in the region. The study was approved by the institutional review board of the Radboud university medical center, Nijmegen, The Netherlands (CMO Arnhem-Nijmegen). A total of 1859 BC patients were included in the study.Smoking assessmentInformation on smoking history was obtained via the lifestyle questionnaire. Patients were asked for their smoking status at recruitment, age at smoking initiation and cessation, number of cigarettes, pipes and cigars smoked per day and duration of smoking in years. The timing of smoking cessation with respect to the diagnosis was calculated as age at diagnosis minus age at cessation. Smoking status at diagnosis was classified as never smoker, former smoker (quitted >1 year before diagnosis), current smoker (continuing cigarette smoker or quitted ≤ 1 year before diagnosis). Ever smokers were defined as the combination of former and current smokers. In the current smokers group, only the smoking period in years before the diagnosis was considered. Smoking amount was evaluated as cigarettes per day. Cumulative smoking exposure (in pack-years) was calculated by multiplying the cigarette smoking duration and packages per day (20 cigarettes representing one package). Pipe and/or cigar smoking (5.9% of all patients) was ignored in the main analyses, assuming that the majority of Dutch pipe and cigar smokers do not inhale the smoke.Outcome assessmentDetailed clinical data concerning age at diagnosis, tumor stage, tumor grade, tumor number (single or multiple), tumor size (<3cm and ≥ 3cm), presence of concomitant CIS, and histological type were collected through a medical file survey. Tumor stage and grade were recorded according to the final conclusion in the pathology report. Tumors with WHO 1973 differentiation grade 1 or 2, WHO/ISUP 2004 low grade, or Malmström (Modified Bergkvist) grade 1 or 2a were considered low-grade tumors. We classified tumors with WHO 1973 differentiation grade 3, WHO/ISUP 2004 high grade, or Malmström (Modified Bergkvist) grade 2b or 3 as high-grade. Tumor aggressiveness was classified according to the risk of progression as follows: low-risk NMIBC (low-grade Ta tumors), high-risk NMIBC (all stage T1 tumors, all high-grade tumors, or CIS) and MIBC (stage ≥ T2 or any stage with ≥N1 and/or M1 ).Statistical analysisPatient and tumor characteristics were compared between the smoking status categories using chi-square, Fisher exact, and one-way analysis of variance (ANOVA) tests where appropriate. The distribution of continuous smoking variables was compared between the categories of tumor multiplicity and tumor aggressiveness and tested for statistical significance using the non-parametric Kruskal-Wallis test. Multinomial logistic regression was used to analyze the relation between smoking intensity and aggressiveness of the tumor with adjustment for gender and age at diagnosis. Low-risk NMIBC was considered as the reference group. We repeated similar analyses for tumor multiplicity as the dependent variable using solitary tumors as the reference group. The association of each smoking intensity variable (smoking amount, smoking duration and cumulative smoking exposure), age at smoking initiation, and time since smoking cessation was assessed separately in ever, former and current smokers. Statistical analysis was performed using IBM SPSS Statistics for Windows 20 (IBCM Corp., Armonk, NY, USA) with a p value < 0.05 indicating statistical significance.This dataset contains the statistical datafile (SPSS) used for the data analyses, saved as a .sav and a .por.

  7. Cancer Registration: National Cancer Patient Experience Survey Wave 1 by...

    • data.europa.eu
    • ckan.publishing.service.gov.uk
    excel xlsx
    Updated Oct 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public Health England (2021). Cancer Registration: National Cancer Patient Experience Survey Wave 1 by patient characteristics and route to diagnosis [Dataset]. https://data.europa.eu/data/datasets/ncpes-wave-1-by-patient-characteristics-and-route-to-diagnosis
    Explore at:
    excel xlsxAvailable download formats
    Dataset updated
    Oct 11, 2021
    Dataset authored and provided by
    Public Health Englandhttps://www.gov.uk/government/organisations/public-health-england
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    The English Cancer Patient Experience Survey (CPES) is commissioned by NHS England and administered on their behalf by an external survey provider organisation (Quality Health). The survey provides insights into the care experienced by cancer patients across England who were treated as day cases or inpatients. Data from CPES has been linked to cancer registration records recorded by the National Cancer Registration and Analysis Service (the cancer registry in England). Individual responses to Wave 1 of CPES are recorded , alongside characteristics of the patient who has completed the survey.

    Wave 1 of the National Cancer Patient Experience Survey is limited to patients discharged from cancer care between 01/01/2010 – 31/03/2010.

    Data within the file: --PATIENT_PSEUDO_ID (Project specific Pseudonymised Patient ID) GENDER (coded Male, Female) --QUINTILE2010 (Deprivation quintile [1-5], describing the Income Deprivation Domain where 1= least deprived and 5= most deprived) --FINAL_ROUTE (One of eight Routes to Diagnosis- methodology for the assignment of each route is described in Elliss-Brookes L, McPhail S, Greenslade M, Shelton J, Hiom S, Richards M (2012) Routes to diagnosis for cancer – determining the patient journey using multiple routine data sets. British Journal of Cancer 107: 1220–1226.) --AGE (aggregated in 4 categories: <55, 55-64, 65-74, 75+) --STAGE (stage of the cancer coded as I, II, III, IV, missing) --CANCER_SITE (Cancer sites coded in accordance with ICD 10: C00-C14, C15, C16, C18, C19-C20, C25, C33-C34, C43, C49, C50, C54, C56, C61, C64, C67, C73, C82, C83, C85, C90, C91-C95, D05 and ‘all other ICD-10 codes’

    Specific disclosure controls applied: --Gender omitted from the data specification in the following cancer sites: • Female only for C50, D05 and C73 • Male only for C49
    --Self-reported ethnicity (from the CPES surveys) aggregated into white British / non-white British / not specified. --Self-reported ethnicity omitted for C49, C64, C73 (replaced as “missing”).

  8. Mortality rate from cancer in Russia 2023, by federal subject

    • statista.com
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Mortality rate from cancer in Russia 2023, by federal subject [Dataset]. https://www.statista.com/statistics/1168769/death-rate-by-cancer-by-federal-subject-russia/
    Explore at:
    Dataset updated
    Nov 29, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    Russia
    Description

    In 2023, around *** deaths per 100,000 population in Russia were attributed to malignant neoplasms. The highest mortality rate due to that reason across the country was recorded in the Kurgan Oblast, measuring at over *** deaths per 100,000 inhabitants. The Ingushetia Republic had the lowest mortality rate from cancer, at approximately ** deaths per 100,000 population. Cancer mortality in Russia Cancer is the second-leading cause of mortality in Russia, being only superseded by circulatory system diseases which were responsible for *** deaths per 100 thousand population in 2022. However, the number of deaths from cancer has been steadily decreasing year-on-year. In 2021, approximately *** thousand Russians deceased due to a malignant tumor. That marked a four-percent decrease from the previous year. Furthermore, the five-year cancer survival rate reached an all-time maximum. As of 2021, nearly six in ten patients in Russia continued to be registered with an oncological establishment for five years or more after receiving their diagnosis. Growth in cancer risk factors in Russia Some well-known risk factors for cancer include sun exposure, tobacco and alcohol use, a poor diet, and being overweight. Despite the merits of a healthy lifestyle being widely recognized, the share of healthy lifestyle followers in Russia has been following a downward trend over the past years. In particular, the rates of heavy smokers have increased. In 2022, a fifth of Russians consumed one pack of cigarettes a day or more, a three-percent growth from 2020.

  9. Identifying Diseases Treatments in Healthcare Data

    • kaggle.com
    zip
    Updated Mar 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagar Maru (2025). Identifying Diseases Treatments in Healthcare Data [Dataset]. https://www.kaggle.com/datasets/marusagar/identifying-diseases-treatments-in-healthcare-data
    Explore at:
    zip(166655 bytes)Available download formats
    Dataset updated
    Mar 5, 2025
    Authors
    Sagar Maru
    Description

    Identifying Entities (Diseases, Treatments) in Healthcare Data

    Finding diseases and treatments in medical text—because even AI needs a medical degree to understand doctor’s notes! 🩺🤖

    📊 Understanding the Dataset

    In the contemporary healthcare ecosystem, substantial amounts of unstructured textual facts are generated day by day thru electronic health facts (EHRs), medical doctor’s notes, prescriptions, and medical literature. The potential to extract meaningful insights from this records is critical for improving patient care, advancing clinical studies, and optimizing healthcare offerings. The dataset in cognizance incorporates text-based totally scientific statistics, in which sicknesses and their corresponding remedies are embedded inside unstructured sentences.

    The dataset consists of categorized textual content samples, that are classified into: -**Train Sentences**: These sentences comprise clinical records, including patient diagnoses and the treatments administered. -**Train Labels**: The corresponding annotations for the train sentences, marking diseases and remedies as named entities. -**Test Sentences**: Similar to educate sentences however used to evaluate model overall performance. -**Test Labels**: The ground reality labels for the test sentences.

    A sneak from the dataset may look as follows:

    🔍 Example from Dataset:

    Train Sentences:

    _ "The patient was a 62 -year -old man with squamous epithelium, who was previously treated with success with a combination of radiation therapy and chemotherapy."

    Train Labels:

    • Disease: 🦠 lung cancer
    • Treatment: 💉 Radiation therapy, chemotherapy

    This dataset requires the use of** designated Unit Recognition (NER)** to remove and map and map diseases for related treatments 💊, causing the composition of unarmed medical data for analytical purposes.

    ⚙️ Dataset Properties

    1. Unnecessary medical text: Data set contains free-powered medical notes, where disease and treatment conditions are clearly mentioned. Removing this information without clear mapping is a challenge.
    2. Many unit types: Datasets contain different - -called institutions such as diseases, treatment, symptoms and possibly medication.
    3. Relevant addiction: Many treatments apply to many diseases, and proper mapping depends on reference. For example, "radiotherapy" is used for different cancers, which makes relevant understanding significantly.
    4. Unbalanced data distribution: Some diseases and treatment can be displayed more often than others, to balance model performance requires techniques such as overfalling, sub -sampling or transmission of learning.
    5. Domain-specific language: is rich in lesson medical terminology, which requires special preprochet using domain-specific NLP techniques and medical oncology such as UML or SNOM CT.

    🚧 Challenges Working with Dataset

    • Complex medical vocabulary: Medical texts often use vocals, which require special NLP models that are trained at the clinical company.

    • Implicit Relationships: Unlike based datasets, ailment-treatment relationships are inferred from context in preference to explicitly stated.

    • Synonyms and Abbreviations: Diseases and treatments can be cited the use of special names (e.G., ‘myocardial infarction’ vs. ‘coronary heart assault’). Handling such versions is vital.

    • Noise in Data: Unstructured records may additionally contain irrelevant records, typographical errors, and inconsistencies that affect extraction accuracy.

    🛠️ Approach to Extracting Insights from the Dataset

    To extract sicknesses and their respective treatments from this dataset, we follow a based NLP pipeline:

    1. Data Preprocessing 🧹

    • Text Cleaning: Remove needless characters, numbers, and stopwords whilst preserving clinical terms.
    • Tokenization: Split sentences into phrases for higher processing.
    • Medical Term Standardization: Use area-precise libraries like SciSpacy to standardize synonyms and abbreviations.

    2. Named Entity Recognition (NER) Model Development 🤖

    • Annotation: Ensure accurate labeling of sicknesses and treatments in the dataset.
    • Model Selection: Train a deep-mastering-based version like BioBERT or a rule-based model the use of spaCy.
    • Training: Use annotated data to teach a custom NER model that classifies words as sickness or treatment entities.
    • Evaluation: Measure precision, bear in mind, and F1-score to evaluate version overall performance.

    3. Mapping Diseases to Treatments 🔄

    • Contextual Relationship Extraction: Identify which treatment corresponds to which sickness using dependency parsing and courting extraction.
    • Dictionary or Tabular Output: Store extracted mappings in a based layout.

    Example Output:

    | 🦠 Disease | 💉 Treatments | |----------|--------------------...

  10. Long-term inpatient disease burden in the Adult Life after Childhood Cancer...

    • plos.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofie de Fine Licht; Kathrine Rugbjerg; Thorgerdur Gudmundsdottir; Trine G. Bonnesen; Peter Haubjerg Asdahl; Anna Sällfors Holmqvist; Laura Madanat-Harjuoja; Laufey Tryggvadottir; Finn Wesenberg; Henrik Hasle; Jeanette F. Winther; Jørgen H. Olsen (2023). Long-term inpatient disease burden in the Adult Life after Childhood Cancer in Scandinavia (ALiCCS) study: A cohort study of 21,297 childhood cancer survivors [Dataset]. http://doi.org/10.1371/journal.pmed.1002296
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sofie de Fine Licht; Kathrine Rugbjerg; Thorgerdur Gudmundsdottir; Trine G. Bonnesen; Peter Haubjerg Asdahl; Anna Sällfors Holmqvist; Laura Madanat-Harjuoja; Laufey Tryggvadottir; Finn Wesenberg; Henrik Hasle; Jeanette F. Winther; Jørgen H. Olsen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundSurvivors of childhood cancer are at increased risk for a wide range of late effects. However, no large population-based studies have included the whole range of somatic diagnoses including subgroup diagnoses and all main types of childhood cancers. Therefore, we aimed to provide the most detailed overview of the long-term risk of hospitalisation in survivors of childhood cancer.Methods and findingsFrom the national cancer registers of Denmark, Finland, Iceland, and Sweden, we identified 21,297 5-year survivors of childhood cancer diagnosed with cancer before the age of 20 years in the periods 1943–2008 in Denmark, 1971–2008 in Finland, 1955–2008 in Iceland, and 1958–2008 in Sweden. We randomly selected 152,231 population comparison individuals matched by age, sex, year, and country (or municipality in Sweden) from the national population registers. Using a cohort design, study participants were followed in the national hospital registers in Denmark, 1977–2010; Finland, 1975–2012; Iceland, 1999–2008; and Sweden, 1968–2009. Disease-specific hospitalisation rates in survivors and comparison individuals were used to calculate survivors’ standardised hospitalisation rate ratios (RRs), absolute excess risks (AERs), and standardised bed day ratios (SBDRs) based on length of stay in hospital. We adjusted for sex, age, and year by indirect standardisation. During 336,554 person-years of follow-up (mean: 16 years; range: 0–42 years), childhood cancer survivors experienced 21,325 first hospitalisations for diseases in one or more of 120 disease categories (cancer recurrence not included), when 10,999 were expected, yielding an overall RR of 1.94 (95% confidence interval [95% CI] 1.91–1.97). The AER was 3,068 (2,980–3,156) per 100,000 person-years, meaning that for each additional year of follow-up, an average of 3 of 100 survivors were hospitalised for a new excess disease beyond the background rates. Approximately 50% of the excess hospitalisations were for diseases of the nervous system (19.1% of all excess hospitalisations), endocrine system (11.1%), digestive organs (10.5%), and respiratory system (10.0%). Survivors of all types of childhood cancer were at increased, persistent risk for subsequent hospitalisation, the highest risks being those of survivors of neuroblastoma (RR: 2.6 [2.4–2.8]; n = 876), hepatic tumours (RR: 2.5 [2.0–3.1]; n = 92), central nervous system tumours (RR: 2.4 [2.3–2.5]; n = 6,175), and Hodgkin lymphoma (RR: 2.4 [2.3–2.5]; n = 2,027). Survivors spent on average five times as many days in hospital as comparison individuals (SBDR: 4.96 [4.94–4.98]; n = 422,218). The analyses of bed days in hospital included new primary cancers and recurrences. Of the total 422,218 days survivors spent in hospital, 47% (197,596 bed days) were for new primary cancers and recurrences. Our study is likely to underestimate the absolute overall disease burden experienced by survivors, as less severe late effects are missed if they are treated sufficiently in the outpatient setting or in the primary health care system.ConclusionsChildhood cancer survivors were at increased long-term risk for diseases requiring inpatient treatment even decades after their initial cancer. Health care providers who do not work in the area of late effects, especially those in primary health care, should be aware of this highly challenged group of patients in order to avoid or postpone hospitalisations by prevention, early detection, and appropriate treatments.

  11. c

    National Cancer Patient Experience Survey, 2013-2014

    • datacatalogue.cessda.eu
    Updated Nov 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health (2024). National Cancer Patient Experience Survey, 2013-2014 [Dataset]. http://doi.org/10.5255/UKDA-SN-7562-1
    Explore at:
    Dataset updated
    Nov 28, 2024
    Dataset authored and provided by
    Department of Health
    Time period covered
    Jan 1, 2014 - Jun 1, 2014
    Area covered
    England
    Variables measured
    Individuals, National
    Measurement technique
    Three communications were despatched to patients: initial survey, and two reminders (to non-responders only)., Postal survey
    Description

    Abstract copyright UK Data Service and data collection copyright owner.

    The National Cancer Patient Experience Surveys (NCPES) began in 2010, after the 2007 'Cancer Reform Strategy' set out a commitment to establish a new survey programme. The NCPES is intended to be a vehicle enabling and supporting quality improvement in the NHS and has been used by national bodies, NHS Hospitals, specialist cancer teams, and national and condition specific charities to improve services for patients. It is designed to monitor national progress on cancer care and to help gather vital information on the Transforming Inpatient Care Programme, the National Cancer Survivorship Initiative and the National Cancer Equality Initiative. An Advisory Group was set up for the NCPES with the National Cancer Director, professionals, voluntary sector representatives, academics and patient survey experts. The Group agreed on the following guiding principles and objectives:
    • a standard national survey tool was to be used
    • surveys would be conducted at Trust level and identify cancer groups
    • the survey would cover all cancers and include the whole care pathway
    • the survey should use the word 'cancer' unlike the 2000 and 2004 surveys
    • the survey focus would be on patients (rather than carers)
    • the data would be used for benchmarking performance across Trusts and by cancer groups where numbers allow
    • the data would be used to inform national and local policy
    • the data would be made publicly available whilst observing patient data protection requirements and maintaining confidentiality.
    The survey is intended to be a vehicle enabling and supporting quality improvement in the NHS and has been used by national bodies, NHS Hospitals, specialist cancer teams, and national and condition specific charities to improve services for patients.

    The NCPES has been replicated in Wales (see SN 7510), Northern Ireland, the Isle of Man, parts of Australia, and the Middle East. Further information can be found on the Quality Health Limited National Cancer Patient Experience Survey webpage and the NHS England Cancer Patient Experience Survey webpage.

    2010-2015 surveys temporarily withdrawn
    The data for the 2010-2014 surveys were temporarily withdrawn at the request of the depositor in October 2015. The 2015 data (SN 8163 and the Special Licence version, SN 8164) were temporarily withdrawn at the request of the depositor in February 2020.


    The 2013-2014 survey included all adult patients who were treated for cancer between 1 September and 30 November 2013 in NHS Trusts across England. Patients with all cancers were included, defined by their ICD10 code (cancer diagnosis code). The survey covered both inpatients and day case patients.


    Main Topics:
    The data cover different stages of the patients' 'cancer journey', from diagnosis to outpatient treatment:
    • initial GP visits before diagnosis (how many appointments, time period)
    • diagnostic tests (understanding of these)
    • how patients were told about the cancer diagnosis (understanding, sensitivity, written information)
    • decisions on treatment (understanding, side effects explained, involvement in decision making, written information)
    • whether patients were given a named key worker (Cancer Nurse Specialist provision and experience of them)
    • support measures patients were informed about (information on support groups, financial help, free prescriptions)
    • hospital doctors (understanding, confidence and trust in them, knowledge of patient case)
    • ward nurses (understanding, confidence, availability)
    • overall hospital care and treatment (information provision, privacy, knowledge of case, pain control, dignity and respect)
    • information provided before going home (written information and understanding, information on care at home and health or social services provision)
    • day patient experience (radiotherapy, chemotherapy, side effects, pain control, emotional support, appointment delay, time with doctor, doctor notes and case understanding)
    • wider care experience (hospital and community staff working together, information transfer)
    • demographic data
    • information provided by the participating Trusts such as date of discharge, diagnosis etc.
    Standard Measures: Positive scoring methodology was used to create individual question scores. The National Report used analysis of IMD deciles based on patients' postcodes provided as part of the dataset by individual NHS...

  12. The PANORAMA Challenge: Public Training and Development Dataset (1)

    • zenodo.org
    bin, zip
    Updated Apr 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natália Alves; Natália Alves; Megan Schuurmans; Megan Schuurmans; Derya Yakar; Derya Yakar; Pierpaolo Vendittelli; Pierpaolo Vendittelli; Geert Litjens; Geert Litjens; John Hermans; Henkjan Huisman; Henkjan Huisman; John Hermans (2024). The PANORAMA Challenge: Public Training and Development Dataset (1) [Dataset]. http://doi.org/10.5281/zenodo.10998332
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Apr 22, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Natália Alves; Natália Alves; Megan Schuurmans; Megan Schuurmans; Derya Yakar; Derya Yakar; Pierpaolo Vendittelli; Pierpaolo Vendittelli; Geert Litjens; Geert Litjens; John Hermans; Henkjan Huisman; Henkjan Huisman; John Hermans
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Time period covered
    Apr 22, 2024
    Description

    This dataset represents the PANORAMA: Public Training and Development Dataset. It contains 2238 anonymized contrast-enhanced CT (CECT) scans from 2238 patients acquired at two centers (Radboud University Medical Center, University Medical Center Groningen) based in The Netherlands. Additionally, it contains 194 cases from the Medical Segmentation Decathlon dataset and 80 cases from National Institutes of Health. For all updates/fixes regarding this dataset, please join the challenge and check out our dedicated forum post on this topic. The corresponding labels of the PANORAMA dataset can be found here.

    The PANORAMA challenge is an all-new grand challenge that aims to validate the diagnostic performance of artificial intelligence and radiologists at pancreatic ductal adenocarcinoma (PDAC) detection/diagnosis in CECT, with histopathology and follow-up (≥ 3 years) as the reference standard, in a retrospective setting in the hidden testing dataset. The study hypothesizes that state-of-the-art AI algorithms are non-inferior to radiologists reading CECT.

    Key aspects of the PANORAMA study design have been established in conjunction with an international scientific advisory board of 13 experts in AI and pancreas radiology as well as a patient representative —to unify and standardize present-day guidelines, and to ensure meaningful validation of pancreas AI towards clinical translation (Reinke et al., 2021).

    This PANORAMA dataset contains: batch 1 out of 4

  13. f

    Data from: Supplementary Material for: Hospital Volume and Mortality...

    • datasetcatalog.nlm.nih.gov
    • karger.figshare.com
    Updated Nov 8, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    T. , Jo; N. , Michihata; H. , Matsui; Y. , Hiraishi; T. , Nagase; Y. , Sakamoto; Y. , Yamauchi; H. , Urushiyama; W. , Hasegawa; K. , Fushimi; H. , Yasunaga (2018). Supplementary Material for: Hospital Volume and Mortality following Diagnostic Bronchoscopy in Lung Cancer Patients: Data from a National Inpatient Database in Japan [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000647088
    Explore at:
    Dataset updated
    Nov 8, 2018
    Authors
    T. , Jo; N. , Michihata; H. , Matsui; Y. , Hiraishi; T. , Nagase; Y. , Sakamoto; Y. , Yamauchi; H. , Urushiyama; W. , Hasegawa; K. , Fushimi; H. , Yasunaga
    Description

    Background: Recent advances in bronchoscopy utilizing endobronchial ultrasound (EBUS) as well as lung cancer therapy may have driven physicians to perform diagnostic bronchoscopy (DB) for high-risk patients. Objectives: The aim of this study was to clarify the relationship between hospital volume (HV) and outcomes of DB. Methods: We collected data on inpatients with lung cancer who underwent DB from July 2010 to March 31, 2014. The annual HV of DB was classified as “very low” (≤50 cases/year), “low” (51–100 cases/year), “high” (101–300 cases/year), or “very high” (> 300 cases/year). The primary outcome was all-cause 7-day mortality after DB. Multivariable logistic regression fitted with a generalized estimation equation was performed to evaluate the association between HV and all-cause 7-day mortality after DB, adjusted for patient background factors. Results: We identified a total of 77,755 eligible patients in 954 hospitals. All-cause 7-day mortality was 0.5%. Compared with the low-volume group, 7-day mortality was significantly lower in the high-volume group (odds ratio [OR] = 0.69, 95% confidence interval [CI]: 0.52–0.92, p = 0.010), and a similar trend was shown in the very-high-volume group (OR = 0.67; 95% CI: 0.43–1.05, p = 0.080). Radial EBUS with the guide sheath method and EBUS-guided transbronchial needle aspiration showed a significantly lower 7-day mortality. Conclusions: All-cause 7-day mortality was inversely associated with HV. The risk of DB in patients with lung cancer should be recognized, and the exploitation of EBUS may help reduce mortality after DB.

  14. f

    Data from: Health assistance path of women between diagnosis and treatment...

    • datasetcatalog.nlm.nih.gov
    • scielo.figshare.com
    Updated Mar 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    de Carvalho, Priscila Guedes; Rodrigues, Nádia Cristina Pinheiro; O´Dwer, Gisele (2021). Health assistance path of women between diagnosis and treatment initiation for cervix cancer [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000826310
    Explore at:
    Dataset updated
    Mar 23, 2021
    Authors
    de Carvalho, Priscila Guedes; Rodrigues, Nádia Cristina Pinheiro; O´Dwer, Gisele
    Description

    ABSTRACT This study aims to analyze the health assistance pathway of women living in Rio de Janeiro city diagnosed with cervix cancer who were referred for treatment in a referral oncology unit. In the first stage of the study, we evaluated time elapsed between the cancer diagnosis and the treatment initiation of women enrolled in 2014, taking as reference the time limit of 60 days established by the Brazilian Federal Law 12,372/2012 for treatment initiation at the Unified Health System (SUS). In the second stage, we analyzed the narratives of five women regarding their paths towards health services since the diagnosis up to the first therapeutic intervention, taking into account the aspects of comprehensive health care. It was observed that 88% of the treatments started after the 60-day legal period and that 65.5% of the women received a diagnosis in an advanced stage of the disease. The treatment initiation mean was 115.4 days. Main problems seized in path analysis concern the availability of services and the integration of actions throughout the different levels of health care, as well as the lack of information on the disease and the purpose of PAP smears.

  15. O

    COVID-19 case rate per 100,000 population and percent test positivity in the...

    • data.ct.gov
    • catalog.data.gov
    csv, xlsx, xml
    Updated Jun 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Public Health (2022). COVID-19 case rate per 100,000 population and percent test positivity in the last 14 days by town - ARCHIVE [Dataset]. https://data.ct.gov/widgets/hree-nys2
    Explore at:
    csv, xml, xlsxAvailable download formats
    Dataset updated
    Jun 23, 2022
    Dataset authored and provided by
    Department of Public Health
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.

    The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.

    The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .

    The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .

    The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.

    This dataset includes a count and rate per 100,000 population for COVID-19 cases, a count of COVID-19 molecular diagnostic tests, and a percent positivity rate for tests among people living in community settings for the previous two-week period. Dates are based on date of specimen collection (cases and positivity).

    A person is considered a new case only upon their first COVID-19 testing result because a case is defined as an instance or bout of illness. If they are tested again subsequently and are still positive, it still counts toward the test positivity metric but they are not considered another case.

    Percent positivity is calculated as the number of positive tests among community residents conducted during the 14 days divided by the total number of positive and negative tests among community residents during the same period. If someone was tested more than once during that 14 day period, then those multiple test results (regardless of whether they were positive or negative) are included in the calculation.

    These case and test counts do not include cases or tests among people residing in congregate settings, such as nursing homes, assisted living facilities, or correctional facilities.

    These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).

    DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/s22x-83rd

    As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.

    With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).

    Additional notes: As of 11/5/2020, CT DPH has added antigen testing for SARS-CoV-2 to reported test counts in this dataset. The tests included in this dataset include both molecular and antigen datasets. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests.

    The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.

    Data suppression is applied when the rate is <5 cases per 100,000 or if there are <5 cases within the town. Information on why data suppression rules are applied can be found online here: https://www.cdc.gov/cancer/uscs/technical_notes/stat_methods/suppression.htm

  16. S

    COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE

    • splitgraph.com
    • data.ct.gov
    • +2more
    Updated Aug 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Public Health (2023). COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE [Dataset]. https://www.splitgraph.com/ct-gov/covid19-cases-and-deaths-by-raceethnicity-archive-7rne-efic/
    Explore at:
    application/openapi+json, json, application/vnd.splitgraph.imageAvailable download formats
    Dataset updated
    Aug 2, 2023
    Dataset authored and provided by
    Department of Public Health
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.

    The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.

    The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .

    The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .

    The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.

    COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update.

    The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates.

    The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.

    Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf

    Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic.

    Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical

    Splitgraph serves as an HTTP API that lets you run SQL queries directly on this data to power Web applications. For example:

    See the Splitgraph documentation for more information.

  17. d

    Health risk assessment of inhaled oil spill emissions with and without...

    • search.dataone.org
    • data.griidc.org
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koehler, Kirsten (2025). Health risk assessment of inhaled oil spill emissions with and without adding dispersant (due to volatile organic compounds) [Dataset]. http://doi.org/10.7266/N7RX99PN
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    GRIIDC
    Authors
    Koehler, Kirsten
    Description

    We performed laboratory measurements to record concentration of different volatile organic compounds (VOCs) emitted from a crude oil slick before and after premixing with dispersant. We input these concentrations into a health risk assessment model to estimate the cancer risk and hazard quotients based on USEPA-designated measures and reference concentrations. We targeted the health risk assessment of cleanup workers or residents nearby. Based on the results, the cancer risk of exposure to toluene and benzene reduced from 74 and 57 excess lifetime cancer cases per million for one hour per day of exposure continuing for 3 months to 66 and 37 (11% lower) excess lifetime cancer cases per million. Dispersant addition was effective in emission reduction of the lighter VOCs (up to 30% lower emission rate). However, hazard quotients of the non-carcinogenic VOCs even after dispersant addition were 2 to 3 orders of magnitude greater than 1 meaning that there are serious concerns about exposure to these VOCs.

  18. c

    The global Her2 Antibodies Market size will be USD 9351.4 million in 2025.

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). The global Her2 Antibodies Market size will be USD 9351.4 million in 2025. [Dataset]. https://www.cognitivemarketresearch.com/her2-antibodies-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jun 20, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global Her2 Antibodies Market size will be USD 9351.4 million in 2025. It will expand at a compound annual growth rate (CAGR) of 5.50% from 2025 to 2033.

    North America held the major market share for more than 37% of the global revenue with a market size of USD 3460.02 million in 2025 and will grow at a compound annual growth rate (CAGR) of 14.0% from 2025 to 2033.
    Europe accounted for a market share of over 29% of the global revenue with a market size of USD 2711.91 million.
    APAC held a market share of around 24% of the global revenue with a market size of USD 2244.34 million in 2025 and will grow at a compound annual growth rate (CAGR) of 7.5% from 2025 to 2033.
    South America has a market share of more than 3.8% of the global revenue with a market size of USD 355.35 million in 2025 and will grow at a compound annual growth rate (CAGR) of 4.5% from 2025 to 2033.
    Middle East had a market share of around 4% of the global revenue and was estimated at a market size of USD 374.06 million in 2025 and will grow at a compound annual growth rate (CAGR) of 4.8% from 2025 to 2033.
    Africa had a market share of around 2.2% of the global revenue and was estimated at a market size of USD 205.73 million in 2025 and will grow at a compound annual growth rate (CAGR) of 5.2% from 2025 to 2033.
    Pertuzumab the fastest growing segment of the Her2 Antibodies Market industry
    

    Market Dynamics of Her2 Antibodies Market

    Key Drivers for Her2 Antibodies Market

    Government Initiatives To Improve Breast Cancer Care And Treatment Boost her2 antibodies Market

    Government initiatives to improve breast cancer care and treatment are expected to drive future growth in the HER2 antibody market. These initiatives include policies, funding, and programs to improve the accessibility, affordability, and quality of care for people with breast cancer. These initiatives advance patient care and outcomes by increasing access to HER2 antibodies, a critical treatment for HER2-positive breast cancer, via subsidized programs and research funding. For instance, in February 2023, the World Health Organization (WHO) released a new Global Breast Cancer Initiative Framework, which outlines a strategy for saving 2.5 million lives from breast cancer by 2040. As a result, government efforts to improve breast cancer care are driving the HER2 antibody market.

    https://www.who.int/news/item/03-02-2023-who-launches-new-roadmap-on-breast-cancer”/

    Increasing Incidence Of Breast Cancer Cases Fuels her2 antibodies Market

    The rising global incidence of breast cancer cases is expected to drive growth in the HER2 antibodies market over the forecast period. For instance, in January 2022, the American Cancer Society predicted that there would be 1.9 million new cancer diagnoses and 609,360 cancer-related deaths in the United States, equating to approximately 1,670 deaths per day. Breast cancer is one of the four most common types of cancer worldwide, accounting for a sizable proportion of new cancer cases. As a result, the rise in global breast cancer incidence rates is expected to drive up demand for HER2 antibodies in the coming years.

    https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts-figures/cancer-facts-figures-2022.html#:~:text=Estimated%20numbers%20of%20new%20cancer,factors%2C%20early%20detection%2C%20and%20treatment”/

    Restraint Factor for the Her2 Antibodies Market

    High cost of HER2 antibody therapies Limit Market Growth

    The high cost of HER2 antibody treatments significantly impedes market expansion by limiting patient access to these life-changing therapies. Most patients with HER2-positive breast cancer may be discouraged by the cost of these therapies, forcing them to postpone or discontinue treatment. This is exacerbated in areas with limited healthcare coverage or inadequate insurance coverage for such advanced therapies. As a result, the premium price can create disparities in treatment access, affecting overall patient outcomes and limiting the market's growth potential. To increase patient access and market growth, HER2 antibody therapies must be made more affordable. Introduction of the Her2 Antibodies Market

    HER2 antibodies are used in the treatment of HER2-positive breast cancer. HER2 is part of the human epidermal growth factor family. Overexpression of the HER2 oncogene causes the development and progression of some types...

  19. Lifestyle choices of individuals with cancer in the United Kingdom 2014-2018...

    • statista.com
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Lifestyle choices of individuals with cancer in the United Kingdom 2014-2018 [Dataset]. https://www.statista.com/statistics/418807/lifestyle-of-individuals-with-cancer-in-the-united-kingdom/
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United Kingdom
    Description

    This statistic depicts the lifestyle trends of adults diagnosed with cancer in the United Kingdom in 2014 and 2018. In 2018, 53 percent of adults with cancer did vigorous exercise of 20 minutes or more on at least one day per month. Simultaneously, 82 percent of adults that had been diagnosed with cancer drunk alcohol.

  20. CBS News/New York Times Women's Health Poll, February 1997

    • icpsr.umich.edu
    ascii, sas, spss +1
    Updated Jan 31, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Inter-university Consortium for Political and Social Research [distributor] (2007). CBS News/New York Times Women's Health Poll, February 1997 [Dataset]. http://doi.org/10.3886/ICPSR04487.v1
    Explore at:
    ascii, sas, spss, stataAvailable download formats
    Dataset updated
    Jan 31, 2007
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/4487/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/4487/terms

    Time period covered
    Feb 1997
    Area covered
    United States
    Description

    This special topic poll, fielded February 18-19, 1997, is part of a continuing series of monthly surveys that solicit public opinion on the presidency and on a range of other political and social issues. The focus of this data collection was on women's health issues. Views were sought on whether government health agencies paid enough attention to women's health issues, and how well the federal government regulated the environmental practices of businesses and the safety of medical equipment and procedures. Respondents were asked to name the leading cause of death for women and whether they had ever heard of mammograms. Female respondents were polled on whether a doctor had ever discussed mammograms with them, whether they had ever had one, how accurate, safe, and painful they were, at which age women should begin getting mammograms, and whether the federal government should set guidelines for mammograms. Female respondents were also polled on the benefits of early detection of breast cancer and how often they conducted breast self-examinations. All respondents were polled on whether they had noticed the new television program ratings system, whether they had used the ratings to prohibit their children from watching certain television programs, and how many hours per day their children watched television. Additional topics addressed health insurance coverage, whether the respondent or a female relative was ever diagnosed with breast cancer, and whether respondents would like to take an "adventure" vacation. Demographic variables included sex, age, race, education level, household income, political party affiliation, political philosophy, type of residential area (e.g., urban or rural), and religious preference.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mikey-TraceGod (2025). Lung-Cancer-Risk-Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/12844025
Organization logo

Lung-Cancer-Risk-Dataset

A Clean, Preprocessed Dataset with 50,000 Patient Profiles for Lung Cancer Risk

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 23, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mikey-TraceGod
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Lung Cancer Risk Dataset

Overview

This dataset contains 50,000 patient profiles designed for lung cancer risk analysis and machine learning applications. The dataset is clean, preprocessed, and ready for immediate use in classification tasks, statistical analysis, and data visualization.

  • Rows: 50,000
  • Columns: 11
  • File: preprocessed_lung_cancer_dataset.csv
  • License: CC0: Public Domain

Dataset Description

The dataset includes patient profiles with features based on established lung cancer risk factors such as smoking history, environmental exposures, and chronic lung conditions. All data is synthetic and designed to reflect realistic risk factor distributions while maintaining patient privacy.

Features

ColumnTypeDescriptionValues/Range
patient_idIntegerUnique patient identifier100000-149999
ageIntegerPatient age in years18-100
genderStringPatient gender'Male', 'Female'
pack_yearsFloatSmoking exposure (years × packs per day)0-100
radon_exposureStringResidential radon exposure level'Low', 'Medium', 'High'
asbestos_exposureStringOccupational asbestos exposure history'Yes', 'No'
secondhand_smoke_exposureStringPassive smoking exposure'Yes', 'No'
copd_diagnosisStringChronic obstructive pulmonary disease diagnosis'Yes', 'No'
alcohol_consumptionStringAlcohol consumption pattern'None', 'Moderate', 'Heavy'
family_historyStringFamily history of lung cancer'Yes', 'No'
lung_cancerStringTarget variable: Lung cancer diagnosis'Yes', 'No'

Data Quality

  • Complete: No missing values or duplicates
  • Clean: All values within realistic ranges
  • Balanced Features: Realistic distribution of risk factors
  • Target Distribution: Approximately 25% positive cases, reflecting real-world lung cancer prevalence

Use Cases

  • Binary classification modeling
  • Risk factor correlation analysis
  • Data visualization and exploratory analysis
  • Machine learning pipeline development
  • Statistical hypothesis testing
Search
Clear search
Close search
Google apps
Main menu