5 datasets found
  1. County Cancer Death Rates

    • kaggle.com
    Updated Dec 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). County Cancer Death Rates [Dataset]. https://www.kaggle.com/datasets/thedevastator/county-cancer-death-rates
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 3, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    County Cancer Death Rates

    County-level cancer death rates with related variables

    By Noah Rippner [source]

    About this dataset

    This dataset provides comprehensive information on county-level cancer death and incidence rates, as well as various related variables. It includes data on age-adjusted death rates, average deaths per year, recent trends in cancer death rates, recent 5-year trends in death rates, and average annual counts of cancer deaths or incidence. The dataset also includes the federal information processing standards (FIPS) codes for each county.

    Additionally, the dataset indicates whether each county met the objective of a targeted death rate of 45.5. The recent trend in cancer deaths or incidence is also captured for analysis purposes.

    The purpose of the death.csv file within this dataset is to offer detailed information specifically concerning county-level cancer death rates and related variables. On the other hand, the incd.csv file contains data on county-level cancer incidence rates and additional relevant variables.

    To provide more context and understanding about the included data points, there is a separate file named cancer_data_notes.csv. This file serves to provide informative notes and explanations regarding the various aspects of the cancer data used in this dataset.

    Please note that this particular description provides an overview for a linear regression walkthrough using this dataset based on Python programming language. It highlights how to source and import the data properly before moving into data preparation steps such as exploratory analysis. The walkthrough further covers model selection and important model diagnostics measures.

    It's essential to bear in mind that this example serves as an initial attempt at creating a multivariate Ordinary Least Squares regression model using these datasets from various sources like cancer.gov along with US Census American Community Survey data. This baseline model allows easy comparisons with future iterations intended for improvements or refinements.

    Important columns found within this extensively documented Kaggle dataset include County names along with their corresponding FIPS codes—a standardized coding system by Federal Information Processing Standards (FIPS). Moreover,Met Objective of 45.5? (1) column denotes whether a specific county achieved the targeted objective of a death rate of 45.5 or not.

    Overall, this dataset aims to offer valuable insights into county-level cancer death and incidence rates across various regions, providing policymakers, researchers, and healthcare professionals with essential information for analysis and decision-making purposes

    How to use the dataset

    • Familiarize Yourself with the Columns:

      • County: The name of the county.
      • FIPS: The Federal Information Processing Standards code for the county.
      • Met Objective of 45.5? (1): Indicates whether the county met the objective of a death rate of 45.5 (Boolean).
      • Age-Adjusted Death Rate: The age-adjusted death rate for cancer in the county.
      • Average Deaths per Year: The average number of deaths per year due to cancer in the county.
      • Recent Trend (2): The recent trend in cancer death rates/incidence in the county.
      • Recent 5-Year Trend (2) in Death Rates: The recent 5-year trend in cancer death rates/incidence in the county.
      • Average Annual Count: The average annual count of cancer deaths/incidence in the county.
    • Determine Counties Meeting Objective: Use this dataset to identify counties that have met or not met an objective death rate threshold of 45.5%. Look for entries where Met Objective of 45.5? (1) is marked as True or False.

    • Analyze Age-Adjusted Death Rates: Study and compare age-adjusted death rates across different counties using Age-Adjusted Death Rate values provided as floats.

    • Explore Average Deaths per Year: Examine and compare average annual counts and trends regarding deaths caused by cancer, using Average Deaths per Year as a reference point.

    • Investigate Recent Trends: Assess recent trends related to cancer deaths or incidence by analyzing data under columns such as Recent Trend, Recent Trend (2), and Recent 5-Year Trend (2) in Death Rates. These columns provide information on how cancer death rates/incidence have changed over time.

    • Compare Counties: Utilize this dataset to compare counties based on their cancer death rates and related variables. Identify counties with lower or higher average annual counts, age-adjusted death rates, or recent trends to analyze and understand the factors contributing ...

  2. f

    Data from: S1 Dataset -

    • plos.figshare.com
    zip
    Updated Jul 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabrício Silveira; Wanessa Miranda; Rômulo Paes de Sousa (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0305955.s006
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Fabrício Silveira; Wanessa Miranda; Rômulo Paes de Sousa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This study delves into the global evolution of 43 Sustainable Development Goals (SDG) indicators, spanning 7 major health themes across 185 countries to evaluate the potential progress loss due to the COVID-19 pandemic. Both the cross-country and temporal variability of the dataset are employed to estimate an empirical model based on an extended version of the Preston curve, which links well-being to income levels and other key socioeconomic health determinants. The approach reveals significant global evolution trends operating in each SDG indicator assessed. We extrapolate the model yearly between 2020 and 2030 using the IMF’s pre-COVID-19 economic growth projections to show how each country in the dataset are expected to evolve in these health topics throughout the decade, assuming no other external shocks. The results of this baseline scenario are contrasted with a post-COVID-19 scenario, where most of the pandemic costs were already known. The study reveals that economic growth losses are, on average, estimated as 42% and 28% for low- and lower middle-income countries, and of 15% and 7% in high- and upper middle-income countries, respectively, according to the IMF’s projections. These disproportional figures are shown to exacerbate global health inequalities revealed by the curves. The expected progress loss in infectious diseases in low-income countries, for instance, is an average of 34%, against a mean of 6% in high-income countries. The theme of Infectious diseases is followed by injuries and violence; maternal and reproductive health; health systems coverage; and neonatal and infant health as those with worse performance. Low-income countries can expect an average progress loss of 16% across all health indicators assessed, whereas in high-income countries the estimated loss is as low as 3%. The disparity across countries is even more pronounced, with cases where the estimated progress loss is as high as nine times worse than the average loss of 8%. Conversely, countries with greater fiscal capacity are likely to fare much better under the circumstances, despite their worse death count, in many cases. Overall, these findings support the critical importance of integrating the fight against inequalities into the global development agendas.

  3. Water Quality Dataset

    • kaggle.com
    Updated Aug 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ozgurdogan (2021). Water Quality Dataset [Dataset]. https://www.kaggle.com/datasets/ozgurdogan646/water-quality-dataset/suggestions?status=pending
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ozgurdogan
    Description

    Water Quality Dataset

    The basis of this dataset is taken from WaterBase water quality data shared on EAA. After most of the columns there were dropped, new data was created with the help of Worldbank, OSM, Foursquare, SEDAC. After removing the country and city information from the available location information, socioeconomic features of that country were added. However, the distance of certain road types close to those coordinates was also added with OSM. It is thought that such information plays an important role in the pollution of waters.

    Features:

    parameterWaterBodyCategory: Water body category code, as defined in the codelist. (Taken from EAA) observedPropertyDeterminandCode: Unique code of the determinand monitored, as defined in the codelist. (Taken from EAA) procedureAnalysedFraction: Specification of which fraction of the sample was analysed. (Taken from EAA) procedureAnalysedMedia: Type of media monitored. (Taken from EAA) resultUom: Unit of measure for the reported values. (Taken from EAA) phenomenonTimeReferenceYear: Year during which the data were sampled. (Taken from EAA) parameterSamplingPeriod: The period of the year during which the data used for the aggregation were sampled. (Taken from EAA) resultMeanValue: Mean value of the data used for aggregation. (Taken from EAA) waterBodyIdentifier: Unique international identifier of the water body in which the data were obtained. (Taken from EAA) Country: Country info generated by using coordinates. PopulationDensity: Population density of Country TerraMarineProtected_2016_2018: Mean of protected Terra Marine areas of Country Between 2016-2018 TouristMean_1990_2020: Mean of Tourist count of Country between 1990-2020 VenueCount: Venue count in near of given coordinates. netMigration_2011_2018: Mean of migration of given Country between 2011-2018 literacyRate_2010_2018: Literacy rate of Country between 2010-2018 combustibleRenewables_2009_2014: Compustible Renewable count in Country between 2009-2014 droughts_floods_temperature: gdp composition_food_organic_waste_percent composition_glass_percent composition_metal_percent composition_other_percent composition_paper_cardboard_percent composition_plastic_percent composition_rubber_leather_percent composition_wood_percent composition_yard_garden_green_waste_percent waste_treatment_recycling_percent

    Sources: https://www.eea.europa.eu/data-and-maps/data/waterbase-water-quality-2 https://datacatalog.worldbank.org/dataset/what-waste-global-database

  4. o

    Global Beer Characteristics Dataset

    • opendatabay.com
    .undefined
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Global Beer Characteristics Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/0f2eaebe-e8c4-4d69-a4a8-6854506c8ac3
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Retail & Consumer Behavior
    Description

    This dataset provides a detailed collection of tasting profiles and consumer reviews for 3197 unique beers from 934 different breweries. It was created by integrating information from two existing datasets: "Beer Tasting Profiles Dataset" and "1.5 Million Beer Reviews". The primary purpose is to offer a unified resource containing consumer review scores for aroma, appearance, palate, taste, and overall quality, alongside detailed tasting profiles for various brews. This consolidated data allows for deeper analysis of beer characteristics and consumer preferences.

    Columns

    • Name: The name or label of the beer.
    • Style: The style of beer.
    • Brewery: The name of the brewery.
    • Beer Name (Full): The complete beer name, combining Brewery and Brew Name, serving as a unique identifier for each beer.
    • Description: Any available notes on the beer.
    • ABV: The alcohol content of the beer, expressed as a percentage by volume.
    • Min IBU: The minimum International Bitterness Unit value a beer of its style can possess.
    • Max IBU: The maximum International Bitterness Unit value a beer of its style can possess.
    • Astringency: A tasting profile feature describing mouthfeel, calculated from word counts in reviews.
    • Body: A tasting profile feature describing mouthfeel, calculated from word counts in reviews.
    • Alcohol: A tasting profile feature describing mouthfeel, calculated from word counts in reviews.
    • Bitter: A tasting profile feature describing taste, calculated from word counts in reviews.
    • Sweet: A tasting profile feature describing taste, calculated from word counts in reviews.
    • Sour: A tasting profile feature describing taste, calculated from word counts in reviews.
    • Salty: A tasting profile feature describing taste, calculated from word counts in reviews.
    • Fruits: A tasting profile feature describing flavour and aroma, calculated from word counts in reviews.
    • Hoppy: A tasting profile feature describing flavour and aroma, calculated from word counts in reviews.
    • Spices: A tasting profile feature describing flavour and aroma, calculated from word counts in reviews.
    • Malty: A tasting profile feature describing flavour and aroma, calculated from word counts in reviews.
    • review_aroma: The average rating score for the beer's aroma from consumer reviews.
    • review_appearance: The average rating score for the beer's appearance from consumer reviews.
    • review_palate: The average rating score for the beer's palate from consumer reviews.
    • review_taste: The average rating score for the beer's taste from consumer reviews.
    • review_overall: The average overall rating score from consumer reviews.
    • number_of_reviews: The total count of consumer reviews for the beer.

    The tasting profile features (Astringency through Malty) are derived from word counts found in up to 25 reviews for each beer, based on a predefined list of beer descriptors.

    Distribution

    The primary dataset is provided in a CSV file named beer_profile_and_ratings.csv. Additional files, Brewery Name Fuzzy Match List.csv and Beer Name Fuzzy Match List.csv, list breweries and beers included from the source datasets. The dataset contains 3197 unique beers and 934 different breweries. It holds a quality rating of 5 out of 5 and is version 1.0.

    Usage

    This dataset is suitable for a variety of analytical and machine learning applications, including: * Analysing the properties that make a highly-rated beer. * Clustering and building a beer recommendation system based on similarities. * Classifying different beer styles based on tasting profile information. * Predicting a brew's alcohol content (ABV) using known characteristics.

    Coverage

    The dataset covers 3197 unique beers and 934 different breweries, with a global region scope. No specific time range or demographic information is available.

    License

    CC-BY

    Who Can Use It

    • Data scientists and machine learning engineers for developing prediction models, clustering algorithms, and recommendation systems.
    • Researchers interested in consumer behaviour, product characteristics, and sensory analysis within the beverage industry.
    • Beer enthusiasts and connoisseurs looking for detailed insights into beer tasting profiles and ratings.
    • Developers creating applications related to beer discovery or review.

    Dataset Name Suggestions

    • Beer Tasting Profiles and Ratings
    • Brewery and Beer Review Data
    • Global Beer Characteristics Dataset
    • Consumer Beer Insights

    Attributes

    Original Data Source: Beer Profile and Ratings Data Set

  5. G

    Comparative Birth, Death, Marriage data

    • open.canada.ca
    csv, html
    Updated Jul 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Ontario (2025). Comparative Birth, Death, Marriage data [Dataset]. https://open.canada.ca/data/en/dataset/7d6d02f8-140f-467d-a7c1-861480cadb6d
    Explore at:
    html, csvAvailable download formats
    Dataset updated
    Jul 9, 2025
    Dataset provided by
    Government of Ontario
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Time period covered
    Jan 1, 1925 - Dec 31, 2024
    Description

    This dataset compares birth, death and marriage registrations completed by the Office of the Registrar General, beginning in 1925, to the most current published annual report (2022). Data released for 2024 is preliminary and may not match counts from other sources. The data represents counts in the reference calendar quarters, which are collated approximately 90 days after the end of the quarter. Previously released counts for 2024 are updated to reflect vital event registrations completed after the release of the initial report. Each subsequent quarterly report is the cumulative total of the preceding quarterly reports. ServiceOntario’s ability to provide timely information depends on receiving vital event registration information from a variety of sources. The preliminary data presented may not represent all the events that occurred in the reporting period. This is particularly true for events that occurred near the end of the reporting period as they may not have been received by ServiceOntario by the time the data is collated. Final counts for the reporting year will be released with the publication of the Office of the Registrar General Annual Report. The Vital Statistics Act requires that after the end of each calendar year, the Registrar General publish a report that includes the number of births, marriages, deaths, still-births, adoptions and changes of name registered during the calendar year preceding the one that has ended.

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). County Cancer Death Rates [Dataset]. https://www.kaggle.com/datasets/thedevastator/county-cancer-death-rates
Organization logo

County Cancer Death Rates

County-level cancer death rates with related variables

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 3, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description

County Cancer Death Rates

County-level cancer death rates with related variables

By Noah Rippner [source]

About this dataset

This dataset provides comprehensive information on county-level cancer death and incidence rates, as well as various related variables. It includes data on age-adjusted death rates, average deaths per year, recent trends in cancer death rates, recent 5-year trends in death rates, and average annual counts of cancer deaths or incidence. The dataset also includes the federal information processing standards (FIPS) codes for each county.

Additionally, the dataset indicates whether each county met the objective of a targeted death rate of 45.5. The recent trend in cancer deaths or incidence is also captured for analysis purposes.

The purpose of the death.csv file within this dataset is to offer detailed information specifically concerning county-level cancer death rates and related variables. On the other hand, the incd.csv file contains data on county-level cancer incidence rates and additional relevant variables.

To provide more context and understanding about the included data points, there is a separate file named cancer_data_notes.csv. This file serves to provide informative notes and explanations regarding the various aspects of the cancer data used in this dataset.

Please note that this particular description provides an overview for a linear regression walkthrough using this dataset based on Python programming language. It highlights how to source and import the data properly before moving into data preparation steps such as exploratory analysis. The walkthrough further covers model selection and important model diagnostics measures.

It's essential to bear in mind that this example serves as an initial attempt at creating a multivariate Ordinary Least Squares regression model using these datasets from various sources like cancer.gov along with US Census American Community Survey data. This baseline model allows easy comparisons with future iterations intended for improvements or refinements.

Important columns found within this extensively documented Kaggle dataset include County names along with their corresponding FIPS codes—a standardized coding system by Federal Information Processing Standards (FIPS). Moreover,Met Objective of 45.5? (1) column denotes whether a specific county achieved the targeted objective of a death rate of 45.5 or not.

Overall, this dataset aims to offer valuable insights into county-level cancer death and incidence rates across various regions, providing policymakers, researchers, and healthcare professionals with essential information for analysis and decision-making purposes

How to use the dataset

  • Familiarize Yourself with the Columns:

    • County: The name of the county.
    • FIPS: The Federal Information Processing Standards code for the county.
    • Met Objective of 45.5? (1): Indicates whether the county met the objective of a death rate of 45.5 (Boolean).
    • Age-Adjusted Death Rate: The age-adjusted death rate for cancer in the county.
    • Average Deaths per Year: The average number of deaths per year due to cancer in the county.
    • Recent Trend (2): The recent trend in cancer death rates/incidence in the county.
    • Recent 5-Year Trend (2) in Death Rates: The recent 5-year trend in cancer death rates/incidence in the county.
    • Average Annual Count: The average annual count of cancer deaths/incidence in the county.
  • Determine Counties Meeting Objective: Use this dataset to identify counties that have met or not met an objective death rate threshold of 45.5%. Look for entries where Met Objective of 45.5? (1) is marked as True or False.

  • Analyze Age-Adjusted Death Rates: Study and compare age-adjusted death rates across different counties using Age-Adjusted Death Rate values provided as floats.

  • Explore Average Deaths per Year: Examine and compare average annual counts and trends regarding deaths caused by cancer, using Average Deaths per Year as a reference point.

  • Investigate Recent Trends: Assess recent trends related to cancer deaths or incidence by analyzing data under columns such as Recent Trend, Recent Trend (2), and Recent 5-Year Trend (2) in Death Rates. These columns provide information on how cancer death rates/incidence have changed over time.

  • Compare Counties: Utilize this dataset to compare counties based on their cancer death rates and related variables. Identify counties with lower or higher average annual counts, age-adjusted death rates, or recent trends to analyze and understand the factors contributing ...

Search
Clear search
Close search
Google apps
Main menu