Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In the following maps, the U.S. states are divided into groups based on the rates at which people developed or died from cancer in 2013, the most recent year for which incidence data are available.
The rates are the numbers out of 100,000 people who developed or died from cancer each year.
Incidence Rates by State The number of people who get cancer is called cancer incidence. In the United States, the rate of getting cancer varies from state to state.
*Rates are per 100,000 and are age-adjusted to the 2000 U.S. standard population.
‡Rates are not shown if the state did not meet USCS publication criteria or if the state did not submit data to CDC.
†Source: U.S. Cancer Statistics Working Group. United States Cancer Statistics: 1999–2013 Incidence and Mortality Web-based Report. Atlanta (GA): Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute; 2016. Available at: http://www.cdc.gov/uscs.
Death Rates by State Rates of dying from cancer also vary from state to state.
*Rates are per 100,000 and are age-adjusted to the 2000 U.S. standard population.
†Source: U.S. Cancer Statistics Working Group. United States Cancer Statistics: 1999–2013 Incidence and Mortality Web-based Report. Atlanta (GA): Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute; 2016. Available at: http://www.cdc.gov/uscs.
Facebook
TwitterBy Data Exercises [source]
This dataset is a comprehensive collection of data from county-level cancer mortality and incidence rates in the United States between 2000-2014. This data provides an unprecedented level of detail into cancer cases, deaths, and trends at a local level. The included columns include County, FIPS, age-adjusted death rate, average death rate per year, recent trend (2) in death rates, recent 5-year trend (2) in death rates and average annual count for each county. This dataset can be used to provide deep insight into the patterns and effects of cancer on communities as well as help inform policy decisions related to mitigating risk factors or increasing preventive measures such as screenings. With this comprehensive set of records from across the United States over 15 years, you will be able to make informed decisions regarding individual patient care or policy development within your own community!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides comprehensive US county-level cancer mortality and incidence rates from 2000 to 2014. It includes the mortality and incidence rate for each county, as well as whether the county met the objective of 45.5 deaths per 100,000 people. It also provides information on recent trends in death rates and average annual counts of cases over the five year period studied.
This dataset can be extremely useful to researchers looking to study trends in cancer death rates across counties. By using this data, researchers will be able to gain valuable insight into how different counties are performing in terms of providing treatment and prevention services for cancer patients and whether preventative measures and healthcare access are having an effect on reducing cancer mortality rates over time. This data can also be used to inform policy makers about counties needing more target prevention efforts or additional resources for providing better healthcare access within at risk communities.
When using this dataset, it is important to pay close attention to any qualitative columns such as “Recent Trend” or “Recent 5-Year Trend (2)” that may provide insights into long term changes that may not be readily apparent when using quantitative variables such as age-adjusted death rate or average deaths per year over shorter periods of time like one year or five years respectively. Additionally, when studying differences between different counties it is important to take note of any standard FIPS code differences that may indicate that data was collected by a different source with a difference methodology than what was used in other areas studied
- Using this dataset, we can identify patterns in cancer mortality and incidence rates that are statistically significant to create treatment regimens or preventive measures specifically targeting those areas.
- This data can be useful for policymakers to target areas with elevated cancer mortality and incidence rates so they can allocate financial resources to these areas more efficiently.
- This dataset can be used to investigate which factors (such as pollution levels, access to medical care, genetic make up) may have an influence on the cancer mortality and incidence rates in different US counties
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: death .csv | Column name | Description | |:-------------------------------------------|:-------------------------------------------------------------------...
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Card for Lung Cancer
Dataset Summary
The effectiveness of cancer prediction system helps the people to know their cancer risk with low cost and it also helps the people to take the appropriate decision based on their cancer risk status. The data is collected from the website online lung cancer prediction system .
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/virtual10/lungs_cancer.
Facebook
TwitterDeath rate has been age-adjusted by the 2000 U.S. standard population. Single-year data are only available for Los Angeles County overall, Service Planning Areas, Supervisorial Districts, City of Los Angeles overall, and City of Los Angeles Council Districts.Lung cancer is a leading cause of cancer-related death in the US. People who smoke have the greatest risk of lung cancer, though lung cancer can also occur in people who have never smoked. Most cases are due to long-term tobacco smoking or exposure to secondhand tobacco smoke. Cities and communities can take an active role in curbing tobacco use and reducing lung cancer by adopting policies to regulate tobacco retail; reducing exposure to secondhand smoke in outdoor public spaces, such as parks, restaurants, or in multi-unit housing; and improving access to tobacco cessation programs and other preventive services.For more information about the Community Health Profiles Data Initiative, please see the initiative homepage.
Facebook
TwitterI was interested in investigating cancer incidence levels in the US by looking at how they vary by race or state. All the data is collected online from Centers for Disease Control and Prevention, State Cancer Profiles, and United States Census Bureau. This dataset can be used to answer questions on the correlation between poverty levels, insurance levels and cancer incidence levels. Further, one can find which cancers affect a certain race more or a certain state.
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
| Characteristic | Value (N = 26254) |
|---|---|
| Age (years) | Mean ± SD: 61.4± 5 Median (IQR): 60 (57-65) Range: 43-75 |
| Sex | Male: 15512 (59%) Female: 10742 (41%) |
| Race | White: 23969 (91.3%) |
| Ethnicity | Not Available |
Background: The aggressive and heterogeneous nature of lung cancer has thwarted efforts to reduce mortality from this cancer through the use of screening. The advent of low-dose helical computed tomography (CT) altered the landscape of lung-cancer screening, with studies indicating that low-dose CT detects many tumors at early stages. The National Lung Screening Trial (NLST) was conducted to determine whether screening with low-dose CT could reduce mortality from lung cancer.
Methods: From August 2002 through April 2004, we enrolled 53,454 persons at high risk for lung cancer at 33 U.S. medical centers. Participants were randomly assigned to undergo three annual screenings with either low-dose CT (26,722 participants) or single-view posteroanterior chest radiography (26,732). Data were collected on cases of lung cancer and deaths from lung cancer that occurred through December 31, 2009. This dataset includes the low-dose CT scans from 26,254 of these subjects, as well as digitized histopathology images from 451 subjects.
Results: The rate of adherence to screening was more than 90%. The rate of positive screening tests was 24.2% with low-dose CT and 6.9% with radiography over all three rounds. A total of 96.4% of the positive screening results in the low-dose CT group and 94.5% in the radiography group were false positive results. The incidence of lung cancer was 645 cases per 100,000 person-years (1060 cancers) in the low-dose CT group, as compared with 572 cases per 100,000 person-years (941 cancers) in the radiography group (rate ratio, 1.13; 95% confidence interval [CI], 1.03 to 1.23). There were 247 deaths from lung cancer per 100,000 person-years in the low-dose CT group and 309 deaths per 100,000 person-years in the radiography group, representing a relative reduction in mortality from lung cancer with low-dose CT screening of 20.0% (95% CI, 6.8 to 26.7; P=0.004). The rate of death from any cause was reduced in the low-dose CT group, as compared with the radiography group, by 6.7% (95% CI, 1.2 to 13.6; P=0.02).
Conclusions: Screening with the use of low-dose CT reduces mortality from lung cancer. (Funded by the National Cancer Institute; National Lung Screening Trial ClinicalTrials.gov number, NCT00047385).
Data Availability: A summary of the National Lung Screening Trial and its available datasets are provided on the Cancer Data Access System (CDAS). CDAS is maintained by Information Management System (IMS), contracted by the National Cancer Institute (NCI) as keepers and statistical analyzers of the NLST trial data. The full clinical data set from NLST is available through CDAS. Users of TCIA can download without restriction a publicly distributable subset of that clinical data, along with the CT and Histopathology images collected during the trial. (These previously were restricted.)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Cancer Prevention Studies (CPS) aim to understand why and how certain people develop cancer while others remain cancer-free. In 1982, the CPS-II cohort was established and includes approximately 1.2 million men and women, aged at least 30 years, recruited by American Cancer Society (ACS) volunteers in all 50 states of the United States of America and Puerto Rico. Participants have been followed biannually for mortality. The CPS-II Nutrition Cohort was established as a subgroup of the larger CPS-II cohort, in which approximately 185,000 individuals have been followed biennially for cancer incidence, diet, and other exposures, since 1992. The CPS-II Lifelink Cohort/Biorepository was initiated in 1998, and collected blood samples from 40,000 participants and cheek cell samples from 70,000 participants in the CPS-II Nutrition Survey cohort.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study aimed to identify the trends in the incidence of thymic cancer, i.e., thymoma, thymic carcinoma, and thymic neuroendocrine tumor, in the United States. Data from the United States Cancer Statistics (USCS) database (2001–2015) and those from the Surveillance, Epidemiology, and End Results (SEER) database (SEER 9 [1973–2015], SEER 13 [1992–2015], and SEER 18 [2000–2015]) were used in this study. All incidences were per 100,000 population at risk. The trends in incidence were described as annual percent change (APC) using the Joinpoint regression program. Data from the USCS (2001–2015) database showed an increase in thymic cancer diagnosis with an APC of 4.89% from 2001 to 2006, which is mainly attributed to the significant increase in the incidence of thymoma and thymic carcinoma particularly in women. The incidence of thymic cancer did not increase from 2006 to 2015, which may be attributed to the increase in the diagnosis of thymic carcinoma from 2004 to 2015, with a concomitant decrease in thymoma from 2008 to 2015. Before declining, the age-specific incidence of thymic cancer peaked at ages 70–74 years, with a peak incidence at 1.06 per 100,000 population, and decreased in older age groups. The incidence of thymic cancer was higher in men than in women. Asian/Pacific Islanders had the highest incidence of thymoma, followed by black and then white people. The incidence of thymic carcinoma increased from 2004 to 2015, with a concomitant decrease in thymoma from 2008 to 2015. Asian/Pacific Islanders had the highest incidence of thymoma than other races.
Facebook
Twitterhttps://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
breastcanc-ultrasound-class
Background
Cancer is the second leading cause of death worldwide, according to IHME - Global Burden of Disease, with 10.7 mln casualties in 2019.
Amongst the various types of cancer, a huge role is played by breast cancer, which stands in 4th position among the deadliest tumors, with more than 700.000 deaths during 2019 (IHME - Global Burden of Disease).
Moreover, breast cancer has the highest share of number of cases/100 people worldwide… See the full description on the dataset page: https://huggingface.co/datasets/as-cle-bert/breastcancer-auto-segmentation.
Facebook
TwitterWONDER online databases include county-level Compressed Mortality (death certificates) since 1979; county-level Multiple Cause of Death (death certificates) since 1999; county-level Natality (birth certificates) since 1995; county-level Linked Birth / Death records (linked birth-death certificates) since 1995; state & large metro-level United States Cancer Statistics mortality (death certificates) since 1999; state & large metro-level United States Cancer Statistics incidence (cancer registry cases) since 1999; state and metro-level Online Tuberculosis Information System (TB case reports) since 1993; state-level Sexually Transmitted Disease Morbidity (case reports) since 1984; state-level Vaccine Adverse Event Reporting system (adverse reaction case reports) since 1990; county-level population estimates since 1970. The WONDER web server also hosts the Data2010 system with state-level data for compliance with Healthy People 2010 goals since 1998; the National Notifiable Disease Surveillance System weekly provisional case reports since 1996; the 122 Cities Mortality Reporting System weekly death reports since 1996; the Prevention Guidelines database (book in electronic format) published 1998; the Scientific Data Archives (public use data sets and documentation); and links to other online data sources on the "Topics" page.
Facebook
TwitterBy Data Society [source]
This dataset contains key demographic, health status indicators and leading cause of death data to help us understand the current trends and health outcomes in communities across the United States. By looking at this data, it can be seen how different states, counties and populations have changed over time. With this data we can analyze levels of national health services use such as vaccination rates or mammography rates; review leading causes of death to create public policy initiatives; as well as identify risk factors for specific conditions that may be associated with certain populations or regions. The information from these files includes State FIPS Code, County FIPS Code, CHSI County Name, CHSI State Name, CHSI State Abbreviation, Influenza B (FluB) report count & expected cases rate per 100K population , Hepatitis A (HepA) Report Count & expected cases rate per 100K population , Hepatitis B (HepB) Report Count & expected cases rate per 100K population , Measles (Meas) Report Count & expected cases rate per 100K population , Pertussis(Pert) Report Count & expected case rate per 100K population , CRS report count & expected case rate per 100K population , Syphilis report count and expected case rate per 100k popuation. We also look at measures related to preventive care services such as Pap smear screen among women aged 18-64 years old check lower/upper confidence intervals seperately ; Mammogram checks among women aged 40-64 years old specified lower/upper conifence intervals separetly ; Colonosopy/ Proctoscpushy among men aged 50+ measured in lower/upper limits ; Pneumonia Vaccination amongst 65+ with loewr/upper confidence level detail Additionally we have some interesting trend indicating variables like measures of birth adn death which includes general fertility ratye ; Teen Birth Rate by Mother's age group etc Summary Measures covers mortality trend following life expectancy by sex&age categories Vressionable populations access info gives us insight into disablilty ratio + access to envtiromental issues due to poor quality housing facilities Finally Risk Factors cover speicfic hoslitic condtiions suchs asthma diagnosis prevelance cancer diabetes alcholic abuse smoking trends All these information give a good understanding on Healthy People 2020 target setings demograpihcally speaking hence will aid is generating more evience backed policies
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
What the Dataset Contains
This dataset contains valuable information about public health relevant to each county in the United States, broken down into 9 indicator domains: Demographics, Leading Causes of Death, Summary Measures of Health, Measures of Birth and Death Rates, Relative Health Importance, Vulnerable Populations and Environmental Health Conditions, Preventive Services Use Data from BRFSS Survey System Data , Risk Factors and Access to Care/Health Insurance Coverage & State Developed Types of Measurements such as CRS with Multiple Categories Identified for Each Type . The data includes indicators such as percentages or rates for influenza (FLU), hepatitis (HepA/B), measles(MEAS) pertussis(PERT), syphilis(Syphilis) , cervical cancer (CI_Min_Pap_Smear - CI_Max\Pap \Smear), breast cancer (CI\Min Mammogram - CI \Max \Mammogram ) proctoscopy (CI Min Proctoscopy - CI Max Proctoscopy ), pneumococcal vaccinations (Ci min Pneumo Vax - Ci max Pneumo Vax )and flu vaccinations (Ci min Flu Vac - Ci Max Flu Vac). Additionally , it provides information on leading causes of death at both county levels & national level including age-adjusted mortality rates due to suicide among teens aged between 15-19 yrs per 100000 population etc.. Furthermore , summary measures such as age adjusted percentage who consider their physical health fair or poor are provided; vulnerable populations related indicators like relative importance score for disabled adults ; preventive service use related ones ranging from self reported vaccination coverage among men40-64 yrs old against hepatitis B virus etc...
Getting Started With The Dataset
To get started with exploring this dataset first your need to understand what each column in the table represents: State FIPS Code identifies a unique identifier used by various US government agencies which denote states . County FIPS code denotes counties wi...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionSepsis-related mortality in middle-aged and older pancreatic cancer patients constitutes a significant public health issue. This study seeks to analyze trends in the age-adjusted mortality rate (AAMR) for sepsis-related fatalities among these patients in the United States from 1999 to 2023, employing data from the most recent CDC WONDER database. The temporal patterns revealed from this analysis are anticipated to guide subsequent research and public health initiatives.MethodsThe CDC WONDER database was used to look at how many middle-aged and older pancreatic cancer patients in the U.S. died from sepsis between 1999 and 2023. The study utilized AAMR to evaluate temporal mortality patterns among adults aged 45 and older, categorized by race, census region, urban/rural residency, and state, using the Joinpoint regression tool. We calculated the annual percent change (APC) and the average annual percent change (AAPC), and we supplied 95% confidence intervals.ResultsDuring the study period, the sepsis-related death rate among middle-aged and elderly pancreatic cancer patients exhibited a notable increase, with an AAPC of 2.89. Male patients consistently demonstrated a greater AAMR compared to females, with a notable increase recorded [AAPC = 2.73 (95% CI 1.61 to 3.87)]. Black or African American patients had the greatest AAMR, which also went up a lot [AAPC = 2.62 (95% CI 1.76 to 3.48)]. The mortality burden increased significantly with age, reaching its highest point in the 75–84 age range. A regional study found that the Midwest had the highest rise in AAMR [AAPC = 3.74 (95% CI 2.50 to 5.00)]. Urban people consistently exhibited a higher AAMR compared to rural communities, despite the most significant increase in AAMR occurring among rural populations [AAPC = 3.51 (95% CI 2.09 to 4.94)].ConclusionThis study’s findings reveal substantial inequalities among gender, ethnicity, age, and geographic regions. These differences show how important it is to quickly implement targeted measures to lower mortality, especially among individuals at high risk.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Millions of people around the world suffer from cancer without any hope of treatment due to the extravagant treatment costs.
In this dataset you'll find the total money spent on treating different cancers.
This data comes from https://data.world/xprizeai-health/expenditures-for-cancer-care.
Facebook
TwitterIntroductionThe purpose of this study is to culturally adapt the Awareness and Beliefs about Cancer (ABC) measure for use in the Hispanic/Latino population living in the United States (US).MethodsIn accordance with Patient Reported Outcomes (PRO) Consortium guidelines for cross-cultural adaptation of measures for content and linguistic validity, we conducted: two forward-translations, reconciliation, two back-translations, revision and harmonization, six cognitive interviews, revision, external expert review, and finalization of the version. We used a mixed methods approach, conducting cognitive interviews with Hispanic/Latino community members while also convening an expert panel of six clinicians, health professionals, and community representatives and including the in the entire process. After cross-culturally adapting the ABC measure, we assessed the psychometric properties of the instrument using item response theory analysis. Item parameters, discrimination and category thresholds, and standard errors were calculated. For each of the adapted subdomains, we used item information curves to report the graphical profile of item effectiveness.ResultsTwenty-two Hispanic/Latino community members were enrolled in cognitive interviews, and Hispanics/Latinos fluent in Spanish completed the measure to assess its psychometric properties. Cognitive interviews revealed opportunities to improve items. Key changes from the original measure include the inclusion of gender inclusive language and an inquiry into e-cigarette use on items related to smoking habits. Psychometric property analyses revealed that the anticipated delay in seeking medical help, general cancer beliefs, and cancer screening beliefs and behaviors subdomains had some slope parameters that were < 1; this implies that those items were not able to adequately discriminate the latent trait and had poor performance.DiscussionThe adapted ABC measure for US Hispanics/Latinos meets content and linguistic validity standards, with construct validity confirmed for cancer symptom recognition and barriers to symptomatic presentation subdomains, but revisions are necessary for others, highlighting the need for ongoing refinement to ensure the cultural appropriateness of instruments.
Facebook
TwitterRank, number of deaths, percentage of deaths, and age-specific mortality rates for the leading causes of death, by age group and sex, 2000 to most recent year.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionMetastatic breast cancer causes the most breast cancer-related deaths around the world, especially in countries where breast cancer is detected late into its development. Genetic testing for cancer susceptibility started with the BRCA 1 and 2 genes. Still, recent research has shown that variations in other members of the DNA damage response (DDR) are also associated with elevated cancer risk, opening new opportunities for enhanced genetic testing strategies.MethodsWe sequenced BRCA1/2 and twelve other DDR genes from a Mexican-mestizo population of 40 metastatic breast cancer patients through semiconductor sequencing.ResultsOverall, we found 22 variants –9 of them reported for the first time– and a strikingly high proportion of variations in ARID1A. The presence of at least one variant in the ARID1A, BRCA1, BRCA2, or FANCA genes was associated with worse progression-free survival and overall survival in our patient cohort.DiscussionOur results reflected the unique characteristics of the Mexican-mestizo population as the proportion of variants we found differed from that of other global populations. Based on these findings, we suggest routine screening for variants in ARID1A along with BRCA1/2 in breast cancer patients from the Mexican-mestizo population.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This is the Google Search interest data that powers the Visualisation Searching For Health. Google Trends data allows us to see what people are searching for at a very local level. This visualization tracks the top searches for common health issues in the United States, from Cancer to Diabetes, and compares them with the actual location of occurrences for those same health conditions to understand how search data reflects life for millions of Americans.
How does search interest for top health issues change over time? From 2004–2017, the data shows that search interest gradually increased over the past few years. Certain regions show a more significant increase in search interest than others. The increase in search activity is greatest in the Midwest and Northeast, while the changes are noticeably less dramatic in California, Texas, and Idaho. Are people generally becoming more aware of health conditions and health risks?
The search interest data was collected using the Google Trends API. The visualisation also brings in incidences of each condition so they can be compared. The health conditions were hand-selected from the Community Health Status Indicators (CHSI) which provides key indicators for local communities in the United States. The CHSI dataset includes more than 200 measures for each of the 3,141 United States counties. More information about the CHSI can be found on healthdata.gov.
Many striking similarities exist between searches and actual conditions—but the relationship between the Obesity and Diabetes maps stands out the most. “There are many risk factors for type 2 diabetes such as age, race, pregnancy, stress, certain medications, genetics or family history, high cholesterol and obesity. However, the single best predictor of type 2 diabetes is overweight or obesity. Almost 90% of people living with type 2 diabetes are overweight or have obesity. People who are overweight or have obesity have added pressure on their body's ability to use insulin to properly control blood sugar levels, and are therefore more likely to develop diabetes.” —Obesity Society via obesity.org
Facebook
TwitterBy Health [source]
This fascinating dataset takes a look at the leading causes of death in the United States from 1980-2009, broken down by sex, race, and Hispanic origin. This data sheds light on how mortality in the US has changed over time among these categories. Accounting for everything from heart disease to cancer to suicide, this insight can be used by health researchers and policy makers to gain a better understanding of disparities in healthcare and deaths across different groups. Whether studying questions related to public health or more targeted population issues such as gender biases in death rates, this dataset provides an important resource for anyone interested in examining mortality across demographic lines
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset can be used to explore some of the leading causes of death in the United States from 1980 to 2009, broken down by sex, race, and Hispanic origin. This data can be used to better understand mortality trends and risk factors associated with different populations in America.
By using this dataset you can compare and contrast mortality rates across different gender, racial, and ethnic groups during this time period. You can also compare different causes of death within these demographic categories to see if there are any patterns over time or notable differences between groups.
You could even use this data to track changes across population groups as a whole or look at details for specific years or types of causes of death in particular groups. With this information one may gain insight into health disparities across population segments in America— aiding advocates for social change & public policy shifts toward improved health outcomes for all Americans!
- Analyzing regional or state-level differences in mortality rates over time.
- Examining the beahvioral factors or risk factors associated with each cause of death for different genders and populations.
- Examining the prevalence of each cause of death as a proportion to an overall population trend in different socio-economic categories such as race or income level
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: Selected_Trend_Table_from_Health_United_States_2011._Leading_causes_of_death_and_numbers_of_deaths_by_sex_race_and_Hispanic_origin_United_States_1980_and_2009.csv | Column name | Description | |:-------------------|:---------------------------------------------------------------------------------------------------------| | Group | The group of people the cause of death applies to (e.g. men, women, whites, blacks, hispanics). (String) | | Year | The year the cause of death was recorded. (Integer) | | Cause of death | The cause of death. (String) | | Flag | A flag indicating whether the cause of death is considered a leading cause. (Boolean) | | Deaths | The number of deaths attributed to the cause of death. (Integer) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Health.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises 9105 individual critically ill patients across 5 United States medical centers, accessioned throughout 1989-1991 and 1992-1994. Each row concerns hospitalized patient records who met the inclusion and exclusion criteria for nine disease categories: acute respiratory failure, chronic obstructive pulmonary disease, congestive heart failure, liver disease, coma, colon cancer, lung cancer, multiple organ system failure with malignancy, and multiple organ system failure with sepsis. The goal is to determine these patients' 2- and 6-month survival rates based on several physiologic, demographics, and disease severity information. It is an important problem because it addresses the growing national concern over patients' loss of control near the end of life. It enables earlier decisions and planning to reduce the frequency of a mechanical, painful, and prolonged dying process.
For what purpose was the dataset created?
To develop and validate a prognostic model that estimates survival over a 180-day period for seriously ill hospitalized adults (phase I of SUPPORT) and to compare this model's predictions with those of an existing prognostic system and with physicians' independent estimates (SUPPORT phase II).
Who funded the creation of the dataset?
Funded by the Robert Wood Johnson Foundation
What do the instances in this dataset represent?
The instances represent records of critically ill patients admitted to United States hospitals with advanced stages of serious illness.
Are there recommended data splits?
No recommendation, standard train-test split could be used. Can use three-way holdout split (i.e., train-validation-test) when doing model selection.
Does the dataset contain data that might be considered sensitive in any way?
Yes. There is information about race, gender, income, and education level.
Was there any data preprocessing performed?
No. Due to the high percentage of missing values, there are a couple of recommended imputation values: According to the HBiostat Repository (https://hbiostat.org/data/repo/supportdesc, Professor Frank Harrell) the following default values have been found to be useful in imputing missing baseline physiologic data: Baseline Variable Normal Fill-in Value - Serum albumin (alb) 3.5 - PaO2/FiO2 ratio (pafi) 333.3 - Bilirubin (bili) 1.01 - Creatinine (crea) 1.01 - bun 6.51 - White blood count (wblc) 9 (thousands) - Urine output (urine) 2502 There are 159 patients surviving 2 months for whom there were no patient or surrogate interviews. These patients have missing sfdm2.
Additional Information
Data sources are medical records, personal interviews, and the National Death Index (NDI). For each patient administrative records data, clinical data and survey data were collected. The objective of the SUPPORT project was to improve decision-making in order to address the growing national concern over the loss of control that patients have near the end of life and to reduce the frequency of a mechanical, painful, and prolonged process of dying. SUPPORT comprised a two-year prospective observational study (Phase I) followed by a two-year controlled clinical trial (Phase II). Phase I of SUPPORT collected data from patients accessioned during 1989-1991 to characterize the care, treatment preferences, and patterns of decision-making among critically ill patients. It also served as a preliminary step for devising an intervention strategy for improving critically-ill patients' care and for the construction of statistical models for predicting patient prognosis and functional status. An intervention was implemented in Phase II of SUPPORT, which accessioned patients during 1992-1994. The Phase II intervention provided physicians with accurate predictive information on future functional ability, survival probability to six months, and patients' preferences for end-of-life care. Additionally, a skilled nurse was provided as part of the intervention to elicit patient preferences, provide prognoses, enhance understanding, enable palliative care, and facilitate advance planning. The intervention was expected to increase communication, resulting in earlier decisions to have orders against resuscitation, decrease time that patients spent in undesirable states (e.g., in the Intensive Care Unit, on a ventilator, and in a coma), increase physician understanding of patients' preferences for care, decrease patient pain, and decrease hospital resource use. Data collection in both phases of SUPPORT consisted of questionnaires administered to patients, their surrogates, and physicians, plus chart reviews for abstracting clinical, treatment, and decision information. Phase II also collected information regarding the implementation of the intervention, such as patient-specific logs maintained by nurses assigned to patients as part of the intervention. SUPPORT patients were fol...
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains comprehensive monitoring data of Per- and Polyfluoroalkyl Substances (PFAS) and other contaminants in U.S. public water systems, collected under the EPA's Unregulated Contaminant Monitoring Rule (UCMR) program from 2001 to 2024. The data represents a critical resource for understanding the prevalence and patterns of PFAS contamination in drinking water across different regions and time periods.
The dataset combines results from multiple UCMR monitoring cycles (UCMR 1-5) and includes over 4 million observations of various contaminants, with a particular focus on PFAS compounds. Each record represents a single analytical measurement at a public water system.
combined_ucmr_data.csv (4,082,839 rows × 24 columns)Key Fields:
* PWSID: Public Water System Identification number (string)
* PWSName: Name of the Public Water System
* Size: Size category of the water system (L: >10,000, S: ≤10,000 people served)
* FacilityID: Unique identifier for the facility
* FacilityName: Name of the facility
* FacilityWaterType: Source water type
- GW: Ground Water
- SW: Surface Water
- GU: Ground Water Under Direct Influence of Surface Water
- MX: Mixed Water Types
* SamplePointID: Unique identifier for the sampling location
* SamplePointName: Description of the sampling location
* SamplePointType: Type of sampling point (e.g., EP: Entry Point to distribution system)
* CollectionDate: Date of sample collection
* Contaminant: Name of the contaminant analyzed
* MRL: Minimum Reporting Level in μg/L
* Units: Measurement units (typically μg/L)
* MethodID: EPA analytical method used
* AnalyticalResultsSign: < for less than MRL, = for detected values
* AnalyticalResultValue: Numerical result of the analysis
* SampleEventCode: Sampling event identifier (SE1, SE2, SE3, SE4)
* MonitoringRequirement: Type of monitoring (AM: Assessment Monitoring)
* Region: EPA Region number (1-10)
* State: Two-letter state code
This dataset is valuable for: 1. Environmental Science: Analyzing trends in PFAS contamination over time 2. Public Health Research: Identifying areas with elevated PFAS levels 3. Machine Learning: - Predicting future PFAS levels - Identifying patterns in contamination spread - Analyzing geographical and temporal trends 4. Policy Analysis: Informing water quality regulations and standards
Data sourced from EPA's UCMR program. When using this dataset, please cite: - EPA UCMR Program (https://www.epa.gov/dwucmr) - UCMR Data Files (2001-2024)
Special thanks to: - EPA for making this data publicly available - Public Water Systems for collecting and reporting the data - Environmental laboratories for analyzing the samples
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In the following maps, the U.S. states are divided into groups based on the rates at which people developed or died from cancer in 2013, the most recent year for which incidence data are available.
The rates are the numbers out of 100,000 people who developed or died from cancer each year.
Incidence Rates by State The number of people who get cancer is called cancer incidence. In the United States, the rate of getting cancer varies from state to state.
*Rates are per 100,000 and are age-adjusted to the 2000 U.S. standard population.
‡Rates are not shown if the state did not meet USCS publication criteria or if the state did not submit data to CDC.
†Source: U.S. Cancer Statistics Working Group. United States Cancer Statistics: 1999–2013 Incidence and Mortality Web-based Report. Atlanta (GA): Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute; 2016. Available at: http://www.cdc.gov/uscs.
Death Rates by State Rates of dying from cancer also vary from state to state.
*Rates are per 100,000 and are age-adjusted to the 2000 U.S. standard population.
†Source: U.S. Cancer Statistics Working Group. United States Cancer Statistics: 1999–2013 Incidence and Mortality Web-based Report. Atlanta (GA): Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute; 2016. Available at: http://www.cdc.gov/uscs.