By Noah Rippner [source]
This dataset provides comprehensive information on county-level cancer death and incidence rates, as well as various related variables. It includes data on age-adjusted death rates, average deaths per year, recent trends in cancer death rates, recent 5-year trends in death rates, and average annual counts of cancer deaths or incidence. The dataset also includes the federal information processing standards (FIPS) codes for each county.
Additionally, the dataset indicates whether each county met the objective of a targeted death rate of 45.5. The recent trend in cancer deaths or incidence is also captured for analysis purposes.
The purpose of the death.csv file within this dataset is to offer detailed information specifically concerning county-level cancer death rates and related variables. On the other hand, the incd.csv file contains data on county-level cancer incidence rates and additional relevant variables.
To provide more context and understanding about the included data points, there is a separate file named cancer_data_notes.csv. This file serves to provide informative notes and explanations regarding the various aspects of the cancer data used in this dataset.
Please note that this particular description provides an overview for a linear regression walkthrough using this dataset based on Python programming language. It highlights how to source and import the data properly before moving into data preparation steps such as exploratory analysis. The walkthrough further covers model selection and important model diagnostics measures.
It's essential to bear in mind that this example serves as an initial attempt at creating a multivariate Ordinary Least Squares regression model using these datasets from various sources like cancer.gov along with US Census American Community Survey data. This baseline model allows easy comparisons with future iterations intended for improvements or refinements.
Important columns found within this extensively documented Kaggle dataset include County names along with their corresponding FIPS codes—a standardized coding system by Federal Information Processing Standards (FIPS). Moreover,Met Objective of 45.5? (1) column denotes whether a specific county achieved the targeted objective of a death rate of 45.5 or not.
Overall, this dataset aims to offer valuable insights into county-level cancer death and incidence rates across various regions, providing policymakers, researchers, and healthcare professionals with essential information for analysis and decision-making purposes
Familiarize Yourself with the Columns:
- County: The name of the county.
- FIPS: The Federal Information Processing Standards code for the county.
- Met Objective of 45.5? (1): Indicates whether the county met the objective of a death rate of 45.5 (Boolean).
- Age-Adjusted Death Rate: The age-adjusted death rate for cancer in the county.
- Average Deaths per Year: The average number of deaths per year due to cancer in the county.
- Recent Trend (2): The recent trend in cancer death rates/incidence in the county.
- Recent 5-Year Trend (2) in Death Rates: The recent 5-year trend in cancer death rates/incidence in the county.
- Average Annual Count: The average annual count of cancer deaths/incidence in the county.
Determine Counties Meeting Objective: Use this dataset to identify counties that have met or not met an objective death rate threshold of 45.5%. Look for entries where Met Objective of 45.5? (1) is marked as True or False.
Analyze Age-Adjusted Death Rates: Study and compare age-adjusted death rates across different counties using Age-Adjusted Death Rate values provided as floats.
Explore Average Deaths per Year: Examine and compare average annual counts and trends regarding deaths caused by cancer, using Average Deaths per Year as a reference point.
Investigate Recent Trends: Assess recent trends related to cancer deaths or incidence by analyzing data under columns such as Recent Trend, Recent Trend (2), and Recent 5-Year Trend (2) in Death Rates. These columns provide information on how cancer death rates/incidence have changed over time.
Compare Counties: Utilize this dataset to compare counties based on their cancer death rates and related variables. Identify counties with lower or higher average annual counts, age-adjusted death rates, or recent trends to analyze and understand the factors contributing ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study delves into the global evolution of 43 Sustainable Development Goals (SDG) indicators, spanning 7 major health themes across 185 countries to evaluate the potential progress loss due to the COVID-19 pandemic. Both the cross-country and temporal variability of the dataset are employed to estimate an empirical model based on an extended version of the Preston curve, which links well-being to income levels and other key socioeconomic health determinants. The approach reveals significant global evolution trends operating in each SDG indicator assessed. We extrapolate the model yearly between 2020 and 2030 using the IMF’s pre-COVID-19 economic growth projections to show how each country in the dataset are expected to evolve in these health topics throughout the decade, assuming no other external shocks. The results of this baseline scenario are contrasted with a post-COVID-19 scenario, where most of the pandemic costs were already known. The study reveals that economic growth losses are, on average, estimated as 42% and 28% for low- and lower middle-income countries, and of 15% and 7% in high- and upper middle-income countries, respectively, according to the IMF’s projections. These disproportional figures are shown to exacerbate global health inequalities revealed by the curves. The expected progress loss in infectious diseases in low-income countries, for instance, is an average of 34%, against a mean of 6% in high-income countries. The theme of Infectious diseases is followed by injuries and violence; maternal and reproductive health; health systems coverage; and neonatal and infant health as those with worse performance. Low-income countries can expect an average progress loss of 16% across all health indicators assessed, whereas in high-income countries the estimated loss is as low as 3%. The disparity across countries is even more pronounced, with cases where the estimated progress loss is as high as nine times worse than the average loss of 8%. Conversely, countries with greater fiscal capacity are likely to fare much better under the circumstances, despite their worse death count, in many cases. Overall, these findings support the critical importance of integrating the fight against inequalities into the global development agendas.
The basis of this dataset is taken from WaterBase water quality data shared on EAA. After most of the columns there were dropped, new data was created with the help of Worldbank, OSM, Foursquare, SEDAC. After removing the country and city information from the available location information, socioeconomic features of that country were added. However, the distance of certain road types close to those coordinates was also added with OSM. It is thought that such information plays an important role in the pollution of waters.
Features:
parameterWaterBodyCategory: Water body category code, as defined in the codelist. (Taken from EAA) observedPropertyDeterminandCode: Unique code of the determinand monitored, as defined in the codelist. (Taken from EAA) procedureAnalysedFraction: Specification of which fraction of the sample was analysed. (Taken from EAA) procedureAnalysedMedia: Type of media monitored. (Taken from EAA) resultUom: Unit of measure for the reported values. (Taken from EAA) phenomenonTimeReferenceYear: Year during which the data were sampled. (Taken from EAA) parameterSamplingPeriod: The period of the year during which the data used for the aggregation were sampled. (Taken from EAA) resultMeanValue: Mean value of the data used for aggregation. (Taken from EAA) waterBodyIdentifier: Unique international identifier of the water body in which the data were obtained. (Taken from EAA) Country: Country info generated by using coordinates. PopulationDensity: Population density of Country TerraMarineProtected_2016_2018: Mean of protected Terra Marine areas of Country Between 2016-2018 TouristMean_1990_2020: Mean of Tourist count of Country between 1990-2020 VenueCount: Venue count in near of given coordinates. netMigration_2011_2018: Mean of migration of given Country between 2011-2018 literacyRate_2010_2018: Literacy rate of Country between 2010-2018 combustibleRenewables_2009_2014: Compustible Renewable count in Country between 2009-2014 droughts_floods_temperature: gdp composition_food_organic_waste_percent composition_glass_percent composition_metal_percent composition_other_percent composition_paper_cardboard_percent composition_plastic_percent composition_rubber_leather_percent composition_wood_percent composition_yard_garden_green_waste_percent waste_treatment_recycling_percent
Sources: https://www.eea.europa.eu/data-and-maps/data/waterbase-water-quality-2 https://datacatalog.worldbank.org/dataset/what-waste-global-database
This dataset provides a detailed collection of tasting profiles and consumer reviews for 3197 unique beers from 934 different breweries. It was created by integrating information from two existing datasets: "Beer Tasting Profiles Dataset" and "1.5 Million Beer Reviews". The primary purpose is to offer a unified resource containing consumer review scores for aroma, appearance, palate, taste, and overall quality, alongside detailed tasting profiles for various brews. This consolidated data allows for deeper analysis of beer characteristics and consumer preferences.
The tasting profile features (Astringency through Malty) are derived from word counts found in up to 25 reviews for each beer, based on a predefined list of beer descriptors.
The primary dataset is provided in a CSV file named beer_profile_and_ratings.csv
. Additional files, Brewery Name Fuzzy Match List.csv
and Beer Name Fuzzy Match List.csv
, list breweries and beers included from the source datasets. The dataset contains 3197 unique beers and 934 different breweries. It holds a quality rating of 5 out of 5 and is version 1.0.
This dataset is suitable for a variety of analytical and machine learning applications, including: * Analysing the properties that make a highly-rated beer. * Clustering and building a beer recommendation system based on similarities. * Classifying different beer styles based on tasting profile information. * Predicting a brew's alcohol content (ABV) using known characteristics.
The dataset covers 3197 unique beers and 934 different breweries, with a global region scope. No specific time range or demographic information is available.
CC-BY
Original Data Source: Beer Profile and Ratings Data Set
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This dataset compares birth, death and marriage registrations completed by the Office of the Registrar General, beginning in 1925, to the most current published annual report (2022). Data released for 2024 is preliminary and may not match counts from other sources. The data represents counts in the reference calendar quarters, which are collated approximately 90 days after the end of the quarter. Previously released counts for 2024 are updated to reflect vital event registrations completed after the release of the initial report. Each subsequent quarterly report is the cumulative total of the preceding quarterly reports. ServiceOntario’s ability to provide timely information depends on receiving vital event registration information from a variety of sources. The preliminary data presented may not represent all the events that occurred in the reporting period. This is particularly true for events that occurred near the end of the reporting period as they may not have been received by ServiceOntario by the time the data is collated. Final counts for the reporting year will be released with the publication of the Office of the Registrar General Annual Report. The Vital Statistics Act requires that after the end of each calendar year, the Registrar General publish a report that includes the number of births, marriages, deaths, still-births, adoptions and changes of name registered during the calendar year preceding the one that has ended.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
By Noah Rippner [source]
This dataset provides comprehensive information on county-level cancer death and incidence rates, as well as various related variables. It includes data on age-adjusted death rates, average deaths per year, recent trends in cancer death rates, recent 5-year trends in death rates, and average annual counts of cancer deaths or incidence. The dataset also includes the federal information processing standards (FIPS) codes for each county.
Additionally, the dataset indicates whether each county met the objective of a targeted death rate of 45.5. The recent trend in cancer deaths or incidence is also captured for analysis purposes.
The purpose of the death.csv file within this dataset is to offer detailed information specifically concerning county-level cancer death rates and related variables. On the other hand, the incd.csv file contains data on county-level cancer incidence rates and additional relevant variables.
To provide more context and understanding about the included data points, there is a separate file named cancer_data_notes.csv. This file serves to provide informative notes and explanations regarding the various aspects of the cancer data used in this dataset.
Please note that this particular description provides an overview for a linear regression walkthrough using this dataset based on Python programming language. It highlights how to source and import the data properly before moving into data preparation steps such as exploratory analysis. The walkthrough further covers model selection and important model diagnostics measures.
It's essential to bear in mind that this example serves as an initial attempt at creating a multivariate Ordinary Least Squares regression model using these datasets from various sources like cancer.gov along with US Census American Community Survey data. This baseline model allows easy comparisons with future iterations intended for improvements or refinements.
Important columns found within this extensively documented Kaggle dataset include County names along with their corresponding FIPS codes—a standardized coding system by Federal Information Processing Standards (FIPS). Moreover,Met Objective of 45.5? (1) column denotes whether a specific county achieved the targeted objective of a death rate of 45.5 or not.
Overall, this dataset aims to offer valuable insights into county-level cancer death and incidence rates across various regions, providing policymakers, researchers, and healthcare professionals with essential information for analysis and decision-making purposes
Familiarize Yourself with the Columns:
- County: The name of the county.
- FIPS: The Federal Information Processing Standards code for the county.
- Met Objective of 45.5? (1): Indicates whether the county met the objective of a death rate of 45.5 (Boolean).
- Age-Adjusted Death Rate: The age-adjusted death rate for cancer in the county.
- Average Deaths per Year: The average number of deaths per year due to cancer in the county.
- Recent Trend (2): The recent trend in cancer death rates/incidence in the county.
- Recent 5-Year Trend (2) in Death Rates: The recent 5-year trend in cancer death rates/incidence in the county.
- Average Annual Count: The average annual count of cancer deaths/incidence in the county.
Determine Counties Meeting Objective: Use this dataset to identify counties that have met or not met an objective death rate threshold of 45.5%. Look for entries where Met Objective of 45.5? (1) is marked as True or False.
Analyze Age-Adjusted Death Rates: Study and compare age-adjusted death rates across different counties using Age-Adjusted Death Rate values provided as floats.
Explore Average Deaths per Year: Examine and compare average annual counts and trends regarding deaths caused by cancer, using Average Deaths per Year as a reference point.
Investigate Recent Trends: Assess recent trends related to cancer deaths or incidence by analyzing data under columns such as Recent Trend, Recent Trend (2), and Recent 5-Year Trend (2) in Death Rates. These columns provide information on how cancer death rates/incidence have changed over time.
Compare Counties: Utilize this dataset to compare counties based on their cancer death rates and related variables. Identify counties with lower or higher average annual counts, age-adjusted death rates, or recent trends to analyze and understand the factors contributing ...