By Noah Rippner [source]
This dataset provides comprehensive information on county-level cancer death and incidence rates, as well as various related variables. It includes data on age-adjusted death rates, average deaths per year, recent trends in cancer death rates, recent 5-year trends in death rates, and average annual counts of cancer deaths or incidence. The dataset also includes the federal information processing standards (FIPS) codes for each county.
Additionally, the dataset indicates whether each county met the objective of a targeted death rate of 45.5. The recent trend in cancer deaths or incidence is also captured for analysis purposes.
The purpose of the death.csv file within this dataset is to offer detailed information specifically concerning county-level cancer death rates and related variables. On the other hand, the incd.csv file contains data on county-level cancer incidence rates and additional relevant variables.
To provide more context and understanding about the included data points, there is a separate file named cancer_data_notes.csv. This file serves to provide informative notes and explanations regarding the various aspects of the cancer data used in this dataset.
Please note that this particular description provides an overview for a linear regression walkthrough using this dataset based on Python programming language. It highlights how to source and import the data properly before moving into data preparation steps such as exploratory analysis. The walkthrough further covers model selection and important model diagnostics measures.
It's essential to bear in mind that this example serves as an initial attempt at creating a multivariate Ordinary Least Squares regression model using these datasets from various sources like cancer.gov along with US Census American Community Survey data. This baseline model allows easy comparisons with future iterations intended for improvements or refinements.
Important columns found within this extensively documented Kaggle dataset include County names along with their corresponding FIPS codes—a standardized coding system by Federal Information Processing Standards (FIPS). Moreover,Met Objective of 45.5? (1) column denotes whether a specific county achieved the targeted objective of a death rate of 45.5 or not.
Overall, this dataset aims to offer valuable insights into county-level cancer death and incidence rates across various regions, providing policymakers, researchers, and healthcare professionals with essential information for analysis and decision-making purposes
Familiarize Yourself with the Columns:
- County: The name of the county.
- FIPS: The Federal Information Processing Standards code for the county.
- Met Objective of 45.5? (1): Indicates whether the county met the objective of a death rate of 45.5 (Boolean).
- Age-Adjusted Death Rate: The age-adjusted death rate for cancer in the county.
- Average Deaths per Year: The average number of deaths per year due to cancer in the county.
- Recent Trend (2): The recent trend in cancer death rates/incidence in the county.
- Recent 5-Year Trend (2) in Death Rates: The recent 5-year trend in cancer death rates/incidence in the county.
- Average Annual Count: The average annual count of cancer deaths/incidence in the county.
Determine Counties Meeting Objective: Use this dataset to identify counties that have met or not met an objective death rate threshold of 45.5%. Look for entries where Met Objective of 45.5? (1) is marked as True or False.
Analyze Age-Adjusted Death Rates: Study and compare age-adjusted death rates across different counties using Age-Adjusted Death Rate values provided as floats.
Explore Average Deaths per Year: Examine and compare average annual counts and trends regarding deaths caused by cancer, using Average Deaths per Year as a reference point.
Investigate Recent Trends: Assess recent trends related to cancer deaths or incidence by analyzing data under columns such as Recent Trend, Recent Trend (2), and Recent 5-Year Trend (2) in Death Rates. These columns provide information on how cancer death rates/incidence have changed over time.
Compare Counties: Utilize this dataset to compare counties based on their cancer death rates and related variables. Identify counties with lower or higher average annual counts, age-adjusted death rates, or recent trends to analyze and understand the factors contributing ...
Number and percentage of deaths, by month and place of residence, 1991 to most recent year.
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Health [source]
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
In order to use this dataset, start by selecting a particular set of variables to investigate. You can choose from Measure Names (e.g., Death Rates or Life Expectancy), Race (e.g., All Races), Sex (Male/Female) and Year (2011-2013). Once you have selected your desired variables, you can begin analyzing the data by looking at mortality rates and life expectancy averages amongst different populations in the United States over time.
You may also wish to perform more detailed analyses such as identifying trends or examining correlations between features, regional disparities in mortality rates or changes in average life expectancies over time. If so, you can do so by creating line graphs plotted against one or more independent variables such as Race and Sex to see how demographics impact these statistics overall and on a yearly basis using the Year variable computed from July 1st 2010 estimates
- Analyzing mortality and life expectancy trends among certain races and sexes over time.
- Examining the effects of different socioeconomic factors on death rates and life expectancies.
- Making predictions about future mortality rates and average life expectancies with machine learning algorithms
If you use this dataset in your research, please credit the original authors. Data Source
License: Open Database License (ODbL) v1.0 - You are free to: - Share - copy and redistribute the material in any medium or format. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices. - No Derivatives - If you remix, transform, or build upon the material, you may not distribute the modified material. - No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
File: rows.csv | Column name | Description | |:----------------------------|:----------------------------------------------------------------------| | Measure Names | The type of measure being reported. (String) | | Race | The race of the population being reported. (String) | | Sex | The gender of the population being reported. (String) | | Year | The year the data was collected. (Integer) | | Average Life Expectancy | The average life expectancy of the population being reported. (Float) | | Mortality | The mortality rate of the population being reported. (Float) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Health.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Croatia HR: Death Rate: Crude: per 1000 People data was reported at 13.300 Ratio in 2023. This records a decrease from the previous number of 14.800 Ratio for 2022. Croatia HR: Death Rate: Crude: per 1000 People data is updated yearly, averaging 11.200 Ratio from Dec 1960 (Median) to 2023, with 64 observations. The data reached an all-time high of 16.200 Ratio in 2021 and a record low of 8.800 Ratio in 1966. Croatia HR: Death Rate: Crude: per 1000 People data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Croatia – Table HR.World Bank.WDI: Population and Urbanization Statistics. Crude death rate indicates the number of deaths occurring during the year, per 1,000 population estimated at midyear. Subtracting the crude death rate from the crude birth rate provides the rate of natural increase, which is equal to the rate of population change in the absence of migration.;(1) United Nations Population Division. World Population Prospects: 2024 Revision; (2) Statistical databases and publications from national statistical offices; (3) Eurostat: Demographic Statistics; (4) United Nations Statistics Division. Population and Vital Statistics Reprot (various years).;Weighted average;
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Provisional counts of the number of deaths registered in England and Wales, by age, sex, region and Index of Multiple Deprivation (IMD), in the latest weeks for which data are available.
THIS DATASET WAS LAST UPDATED AT 2:11 AM EASTERN ON JULY 12
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
This dataset contains hourly pedestrian counts since 2009 from pedestrian sensor devices located across the city. The data is updated on a monthly basis and can be used to determine variations in pedestrian activity throughout the day.The sensor_id column can be used to merge the data with the Pedestrian Counting System - Sensor Locations dataset which details the location, status and directional readings of sensors. Any changes to sensor locations are important to consider when analysing and interpreting pedestrian counts over time.Importants notes about this dataset:• Where no pedestrians have passed underneath a sensor during an hour, a count of zero will be shown for the sensor for that hour.• Directional readings are not included, though we hope to make this available later in the year. Directional readings are provided in the Pedestrian Counting System – Past Hour (counts per minute) dataset.The Pedestrian Counting System helps to understand how people use different city locations at different times of day to better inform decision-making and plan for the future. A representation of pedestrian volume which compares each location on any given day and time can be found in our Online Visualisation.Related datasets:Pedestrian Counting System – Past Hour (counts per minute)Pedestrian Counting System - Sensor Locations
By Data Exercises [source]
This dataset is a comprehensive collection of data from county-level cancer mortality and incidence rates in the United States between 2000-2014. This data provides an unprecedented level of detail into cancer cases, deaths, and trends at a local level. The included columns include County, FIPS, age-adjusted death rate, average death rate per year, recent trend (2) in death rates, recent 5-year trend (2) in death rates and average annual count for each county. This dataset can be used to provide deep insight into the patterns and effects of cancer on communities as well as help inform policy decisions related to mitigating risk factors or increasing preventive measures such as screenings. With this comprehensive set of records from across the United States over 15 years, you will be able to make informed decisions regarding individual patient care or policy development within your own community!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides comprehensive US county-level cancer mortality and incidence rates from 2000 to 2014. It includes the mortality and incidence rate for each county, as well as whether the county met the objective of 45.5 deaths per 100,000 people. It also provides information on recent trends in death rates and average annual counts of cases over the five year period studied.
This dataset can be extremely useful to researchers looking to study trends in cancer death rates across counties. By using this data, researchers will be able to gain valuable insight into how different counties are performing in terms of providing treatment and prevention services for cancer patients and whether preventative measures and healthcare access are having an effect on reducing cancer mortality rates over time. This data can also be used to inform policy makers about counties needing more target prevention efforts or additional resources for providing better healthcare access within at risk communities.
When using this dataset, it is important to pay close attention to any qualitative columns such as “Recent Trend” or “Recent 5-Year Trend (2)” that may provide insights into long term changes that may not be readily apparent when using quantitative variables such as age-adjusted death rate or average deaths per year over shorter periods of time like one year or five years respectively. Additionally, when studying differences between different counties it is important to take note of any standard FIPS code differences that may indicate that data was collected by a different source with a difference methodology than what was used in other areas studied
- Using this dataset, we can identify patterns in cancer mortality and incidence rates that are statistically significant to create treatment regimens or preventive measures specifically targeting those areas.
- This data can be useful for policymakers to target areas with elevated cancer mortality and incidence rates so they can allocate financial resources to these areas more efficiently.
- This dataset can be used to investigate which factors (such as pollution levels, access to medical care, genetic make up) may have an influence on the cancer mortality and incidence rates in different US counties
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: death .csv | Column name | Description | |:-------------------------------------------|:-------------------------------------------------------------------...
How much do natural disasters cost us? In lives, in dollars, in infrastructure? This dataset attempts to answer those questions, tracking the death toll and damage cost of major natural disasters since 1985. Disasters included are storms ( hurricanes, typhoons, and cyclones ), floods, earthquakes, droughts, wildfires, and extreme temperatures
This dataset contains information on natural disasters that have occurred around the world from 1900 to 2017. The data includes the date of the disaster, the location, the type of disaster, the number of people killed, and the estimated cost in US dollars
- An all-in-one disaster map displaying all recorded natural disasters dating back to 1900.
- Natural disaster hotspots - where do natural disasters most commonly occur and kill the most people?
- A live map tracking current natural disasters around the world
License
See the dataset description for more information.
Current issue 23/09/2020
Please note: Sensors 67, 68 and 69 are showing duplicate records. We are currently working on a fix to resolve this.
This dataset contains minute by minute directional pedestrian counts for the last hour from pedestrian sensor devices located across the city. The data is updated every 15 minutes and can be used to determine variations in pedestrian activity throughout the day.
The sensor_id column can be used to merge the data with the Sensor Locations dataset which details the location, status and directional readings of sensors. Any changes to sensor locations are important to consider when analysing and interpreting historical pedestrian counting data.
Note this dataset may not contain a reading for every sensor for every minute as sensor devices only create a record when one or more pedestrians have passed underneath the sensor.
The Pedestrian Counting System helps us to understand how people use different city locations at different times of day to better inform decision-making and plan for the future. A representation of pedestrian volume which compares each location on any given day and time can be found in our Online Visualisation.
Related datasets:
Pedestrian Counting System – 2009 to Present (counts per hour).
Pedestrian Counting System - Sensor Locations
This United States Environmental Protection Agency (US EPA) feature layer represents monitoring site data, updated hourly concentrations and Air Quality Index (AQI) values for the latest hour received from monitoring sites that report to AirNow.Map and forecast data are collected using federal reference or equivalent monitoring techniques or techniques approved by the state, local or tribal monitoring agencies. To maintain "real-time" maps, the data are displayed after the end of each hour. Although preliminary data quality assessments are performed, the data in AirNow are not fully verified and validated through the quality assurance procedures monitoring organizations used to officially submit and certify data on the EPA Air Quality System (AQS).This data sharing, and centralization creates a one-stop source for real-time and forecast air quality data. The benefits include quality control, national reporting consistency, access to automated mapping methods, and data distribution to the public and other data systems. The U.S. Environmental Protection Agency, National Oceanic and Atmospheric Administration, National Park Service, tribal, state, and local agencies developed the AirNow system to provide the public with easy access to national air quality information. State and local agencies report the Air Quality Index (AQI) for cities across the US and parts of Canada and Mexico. AirNow data are used only to report the AQI, not to formulate or support regulation, guidance or any other EPA decision or position.About the AQIThe Air Quality Index (AQI) is an index for reporting daily air quality. It tells you how clean or polluted your air is, and what associated health effects might be a concern for you. The AQI focuses on health effects you may experience within a few hours or days after breathing polluted air. EPA calculates the AQI for five major air pollutants regulated by the Clean Air Act: ground-level ozone, particle pollution (also known as particulate matter), carbon monoxide, sulfur dioxide, and nitrogen dioxide. For each of these pollutants, EPA has established national air quality standards to protect public health. Ground-level ozone and airborne particles (often referred to as "particulate matter") are the two pollutants that pose the greatest threat to human health in this country.A number of factors influence ozone formation, including emissions from cars, trucks, buses, power plants, and industries, along with weather conditions. Weather is especially favorable for ozone formation when it’s hot, dry and sunny, and winds are calm and light. Federal and state regulations, including regulations for power plants, vehicles and fuels, are helping reduce ozone pollution nationwide.Fine particle pollution (or "particulate matter") can be emitted directly from cars, trucks, buses, power plants and industries, along with wildfires and woodstoves. But it also forms from chemical reactions of other pollutants in the air. Particle pollution can be high at different times of year, depending on where you live. In some areas, for example, colder winters can lead to increased particle pollution emissions from woodstove use, and stagnant weather conditions with calm and light winds can trap PM2.5 pollution near emission sources. Federal and state rules are helping reduce fine particle pollution, including clean diesel rules for vehicles and fuels, and rules to reduce pollution from power plants, industries, locomotives, and marine vessels, among others.How Does the AQI Work?Think of the AQI as a yardstick that runs from 0 to 500. The higher the AQI value, the greater the level of air pollution and the greater the health concern. For example, an AQI value of 50 represents good air quality with little potential to affect public health, while an AQI value over 300 represents hazardous air quality.An AQI value of 100 generally corresponds to the national air quality standard for the pollutant, which is the level EPA has set to protect public health. AQI values below 100 are generally thought of as satisfactory. When AQI values are above 100, air quality is considered to be unhealthy-at first for certain sensitive groups of people, then for everyone as AQI values get higher.Understanding the AQIThe purpose of the AQI is to help you understand what local air quality means to your health. To make it easier to understand, the AQI is divided into six categories:Air Quality Index(AQI) ValuesLevels of Health ConcernColorsWhen the AQI is in this range:..air quality conditions are:...as symbolized by this color:0 to 50GoodGreen51 to 100ModerateYellow101 to 150Unhealthy for Sensitive GroupsOrange151 to 200UnhealthyRed201 to 300Very UnhealthyPurple301 to 500HazardousMaroonNote: Values above 500 are considered Beyond the AQI. Follow recommendations for the Hazardous category. Additional information on reducing exposure to extremely high levels of particle pollution is available here.Each category corresponds to a different level of health concern. The six levels of health concern and what they mean are:"Good" AQI is 0 to 50. Air quality is considered satisfactory, and air pollution poses little or no risk."Moderate" AQI is 51 to 100. Air quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people. For example, people who are unusually sensitive to ozone may experience respiratory symptoms."Unhealthy for Sensitive Groups" AQI is 101 to 150. Although general public is not likely to be affected at this AQI range, people with lung disease, older adults and children are at a greater risk from exposure to ozone, whereas persons with heart and lung disease, older adults and children are at greater risk from the presence of particles in the air."Unhealthy" AQI is 151 to 200. Everyone may begin to experience some adverse health effects, and members of the sensitive groups may experience more serious effects."Very Unhealthy" AQI is 201 to 300. This would trigger a health alert signifying that everyone may experience more serious health effects."Hazardous" AQI greater than 300. This would trigger a health warnings of emergency conditions. The entire population is more likely to be affected.AQI colorsEPA has assigned a specific color to each AQI category to make it easier for people to understand quickly whether air pollution is reaching unhealthy levels in their communities. For example, the color orange means that conditions are "unhealthy for sensitive groups," while red means that conditions may be "unhealthy for everyone," and so on.Air Quality Index Levels of Health ConcernNumericalValueMeaningGood0 to 50Air quality is considered satisfactory, and air pollution poses little or no risk.Moderate51 to 100Air quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people who are unusually sensitive to air pollution.Unhealthy for Sensitive Groups101 to 150Members of sensitive groups may experience health effects. The general public is not likely to be affected.Unhealthy151 to 200Everyone may begin to experience health effects; members of sensitive groups may experience more serious health effects.Very Unhealthy201 to 300Health alert: everyone may experience more serious health effects.Hazardous301 to 500Health warnings of emergency conditions. The entire population is more likely to be affected.Note: Values above 500 are considered Beyond the AQI. Follow recommendations for the "Hazardous category." Additional information on reducing exposure to extremely high levels of particle pollution is available here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the hourly averages (in milligrams per cubic meter, mg/m3) of carbon monoxide (CO) measured by the experimental control units of the European project AIR-BREAK. These are uncertified control units (the official and validated data therefore remain those of the 2 ARPAE stations). CO is a toxic, colourless, odourless, tasteless and non-irritating gas; may be inhaled imperceptibly and may have toxicological effects. It can cause pulmonary edema, have effects on the blood and, in severe cases, lead to death. - The individual measurements are carried out through an electrochemical cell (with a frequency of 5 seconds) and are transmitted via the internet to a first server for automatic checks and verifications. From here, on an hourly basis, the hourly average values are acquired by the IoT server of the Municipality of Ferrara, which stores them in a special report database (based on the PostgreSQL platform). The hourly values of the last week are then transformed into geographical datasets: for each control unit (represented geographically by its precise location) there are 168 hourly values corresponding to the last 7 days; This dataset is automatically updated every hour.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Current issue 23/09/2020 Please note: Sensors 67, 68 and 69 are showing duplicate records. We are currently working on a fix to resolve this.
This dataset contains minute by minute directional pedestrian counts for the last hour from pedestrian sensor devices located across the city. The data is updated every 15 minutes and can be used to determine variations in pedestrian activity throughout the day. The sensor_id column can be used to merge the data with the Sensor Locations dataset which details the location, status and directional readings of sensors. Any changes to sensor locations are important to consider when analysing and interpreting historical pedestrian counting data. Note this dataset may not contain a reading for every sensor for every minute as sensor devices only create a record when one or more pedestrians have passed underneath the sensor. The Pedestrian Counting System helps us to understand how people use different city locations at different times of day to better inform decision-making and plan for the future. A representation of pedestrian volume which compares each location on any given day and time can be found in our Online Visualisation. Related datasets: Pedestrian Counting System – 2009 to Present (counts per hour).Pedestrian Counting System - Sensor Locations
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains detailed player statistics from the VCT 2025 Pacific Stage 1 Valorant tournament, as listed on vlr.gg. The data was manually compiled from the "Stats" tab of the event page and includes performance metrics for professional players across all participating teams.
Rating
, ACS
(Average Combat Score), and Kills:Death
ratio KAST%
: % of rounds where the player had a kill, assist, survived, or was traded ADR
: Average Damage per Round KPR
, APR
, FKPR
, FDPR
: Kills, Assists, First Kills, and First Deaths per Round HS%
: Headshot rate KMax
: Maximum kills in a single map K
, D
, A
: Total Kills, Deaths, Assists FK
, FD
: Total First Kills and First Deaths Clutch_Won
, Clutch_Attempted
, CL%
: Clutch rounds attempted, won, and success rateColumn Name | Description |
---|---|
Player | Player name with team abbreviation |
Round | Tournament stage or round name |
Rating | Overall player rating |
ACS | Average Combat Score |
Kills:Death | Kill-to-death ratio |
KAST% | % of rounds where player had a kill, assist, survived, or was traded |
ADR | Average Damage per Round |
KPR | Kills per Round |
APR | Assists per Round |
FKPR | First Kills per Round |
FDPR | First Deaths per Round |
HS% | Headshot Percentage |
KMax | Highest number of kills in a single map |
K | Total Kills |
D | Total Deaths |
A | Total Assists |
FK | Total First Kills |
FD | Total First Deaths |
Clutch_Won | Total clutch rounds won |
Clutch_Attempted | Total clutch rounds attempted |
CL% | Clutch success percentage |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.
This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.
It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.
Related dataset
Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.
Measurement setup
The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.
The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.
The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.
Data preprocessing
The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:
PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }
Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
Missing IE fields in the captured PR are not included in PR_IE_DATA.
When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:
{'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },
where PR_data is structured as follows:
{ 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.
This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png
At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.
Folder structure
For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.
The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.
Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4
Environments description
The measurements were carried out in the city of Catania, in Piazza UniversitĂ and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.
Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania
Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.
Known dataset shortcomings
Due to technical and physical limitations, the dataset contains some identified deficiencies.
PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.
Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.
The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.
Location 1 - Piazza del Duomo - Chierici
The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.
Location 2 - Via Etnea - Piazza del Duomo
The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.
Location 3 - Via Etnea - Piazza UniversitĂ
Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.
Location 4 - Piazza UniversitĂ
This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.
Recognitions
The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Average Weekly Hours in the United States decreased to 34.20 Hours in June from 34.30 Hours in May of 2025. This dataset provides - United States Average Weekly Hours - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Wages in the United States increased to 31.24 USD/Hour in June from 31.15 USD/Hour in May of 2025. This dataset provides - United States Average Hourly Wages - actual values, historical data, forecast, chart, statistics, economic calendar and news.
This is NOT a raw population dataset. We use our proprietary stack to combine detailed 'WorldPop' UN-adjusted, sex and age structured population data with a spatiotemporal OD matrix.
The result is a dataset where each record indicates how many people can be reached in a fixed timeframe (3 hours in this case) from that record's location.
The dataset is broken down into sex and age bands at 5 year intervals, e.g - male 25-29 (m_25) and also contains a set of features detailing the representative percentage of the total that the count represents.
The dataset provides 48420 records, one for each sampled location. These are labelled with a h3 index at resolution 7 - this allows easy plotting and filtering in Kepler.gl / Deck.gl / Mapbox, or easy conversion to a centroid (lat/lng) or the representative geometry of the hexagonal cell for integration with your geospatial applications and analyses.
A h3 resolution of 7, is a hexagonal cell area equivalent to: - ~1.9928 sq miles - ~5.1613 sq km
Higher resolutions or alternate geographies are available on request.
More information on the h3 system is available here: https://eng.uber.com/h3/
WorldPop data provides for a population count using a grid of 1 arc second intervals and is available for every geography.
More information on the WorldPop data is available here: https://www.worldpop.org/
One of the main use cases historically has been in prospecting for site selection, comparative analysis and network validation by asset investors and logistics companies. The data structure makes it very simple to filter out areas which do not meet requirements such as: - being able to access 70% of the UK population within 4 hours by Truck and show only the areas which do exhibit this characteristic.
Clients often combine different datasets either for different timeframes of interest, or to understand different populations, such as that of the unemployed, or those with particular qualifications within areas reachable as a commute.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
The dataset consists of the average number of hours actually worked as reported by people during the Periodic Labour Force Survey. The data is available by region- urban and rural, gender- male and female, and by status of employment- self employed, salaried, and casual labourers. The years covered in the survey are from July to June. For instance, 2023-24 refers to the period July 2023 to June 2024 and likewise for other years.
By Noah Rippner [source]
This dataset provides comprehensive information on county-level cancer death and incidence rates, as well as various related variables. It includes data on age-adjusted death rates, average deaths per year, recent trends in cancer death rates, recent 5-year trends in death rates, and average annual counts of cancer deaths or incidence. The dataset also includes the federal information processing standards (FIPS) codes for each county.
Additionally, the dataset indicates whether each county met the objective of a targeted death rate of 45.5. The recent trend in cancer deaths or incidence is also captured for analysis purposes.
The purpose of the death.csv file within this dataset is to offer detailed information specifically concerning county-level cancer death rates and related variables. On the other hand, the incd.csv file contains data on county-level cancer incidence rates and additional relevant variables.
To provide more context and understanding about the included data points, there is a separate file named cancer_data_notes.csv. This file serves to provide informative notes and explanations regarding the various aspects of the cancer data used in this dataset.
Please note that this particular description provides an overview for a linear regression walkthrough using this dataset based on Python programming language. It highlights how to source and import the data properly before moving into data preparation steps such as exploratory analysis. The walkthrough further covers model selection and important model diagnostics measures.
It's essential to bear in mind that this example serves as an initial attempt at creating a multivariate Ordinary Least Squares regression model using these datasets from various sources like cancer.gov along with US Census American Community Survey data. This baseline model allows easy comparisons with future iterations intended for improvements or refinements.
Important columns found within this extensively documented Kaggle dataset include County names along with their corresponding FIPS codes—a standardized coding system by Federal Information Processing Standards (FIPS). Moreover,Met Objective of 45.5? (1) column denotes whether a specific county achieved the targeted objective of a death rate of 45.5 or not.
Overall, this dataset aims to offer valuable insights into county-level cancer death and incidence rates across various regions, providing policymakers, researchers, and healthcare professionals with essential information for analysis and decision-making purposes
Familiarize Yourself with the Columns:
- County: The name of the county.
- FIPS: The Federal Information Processing Standards code for the county.
- Met Objective of 45.5? (1): Indicates whether the county met the objective of a death rate of 45.5 (Boolean).
- Age-Adjusted Death Rate: The age-adjusted death rate for cancer in the county.
- Average Deaths per Year: The average number of deaths per year due to cancer in the county.
- Recent Trend (2): The recent trend in cancer death rates/incidence in the county.
- Recent 5-Year Trend (2) in Death Rates: The recent 5-year trend in cancer death rates/incidence in the county.
- Average Annual Count: The average annual count of cancer deaths/incidence in the county.
Determine Counties Meeting Objective: Use this dataset to identify counties that have met or not met an objective death rate threshold of 45.5%. Look for entries where Met Objective of 45.5? (1) is marked as True or False.
Analyze Age-Adjusted Death Rates: Study and compare age-adjusted death rates across different counties using Age-Adjusted Death Rate values provided as floats.
Explore Average Deaths per Year: Examine and compare average annual counts and trends regarding deaths caused by cancer, using Average Deaths per Year as a reference point.
Investigate Recent Trends: Assess recent trends related to cancer deaths or incidence by analyzing data under columns such as Recent Trend, Recent Trend (2), and Recent 5-Year Trend (2) in Death Rates. These columns provide information on how cancer death rates/incidence have changed over time.
Compare Counties: Utilize this dataset to compare counties based on their cancer death rates and related variables. Identify counties with lower or higher average annual counts, age-adjusted death rates, or recent trends to analyze and understand the factors contributing ...