80 datasets found
  1. The World Dataset of COVID-19

    • kaggle.com
    Updated May 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    C-3PO (2021). The World Dataset of COVID-19 [Dataset]. https://www.kaggle.com/datasets/aditeloo/the-world-dataset-of-covid19/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 25, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    C-3PO
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    World
    Description

    Context

    These datasets are from Our World in Data. Their complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data. It is updated daily and includes data on confirmed cases, deaths, hospitalizations, testing, and vaccinations as well as other variables of potential interest.

    Content

    Confirmed cases and deaths:

    our data comes from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). We discuss how and when JHU collects and publishes this data. The cases & deaths dataset is updated daily. Note: the number of cases or deaths reported by any institution—including JHU, the WHO, the ECDC, and others—on a given day does not necessarily represent the actual number on that date. This is because of the long reporting chain that exists between a new case/death and its inclusion in statistics. This also means that negative values in cases and deaths can sometimes appear when a country corrects historical data because it had previously overestimated the number of cases/deaths. Alternatively, large changes can sometimes (although rarely) be made to a country's entire time series if JHU decides (and has access to the necessary data) to correct values retrospectively.

    Hospitalizations and intensive care unit (ICU) admissions:

    our data comes from the European Centre for Disease Prevention and Control (ECDC) for a select number of European countries; the government of the United Kingdom; the Department of Health & Human Services for the United States; the COVID-19 Tracker for Canada. Unfortunately, we are unable to provide data on hospitalizations for other countries: there is currently no global, aggregated database on COVID-19 hospitalization, and our team at Our World in Data does not have the capacity to build such a dataset.

    Testing for COVID-19:

    this data is collected by the Our World in Data team from official reports; you can find further details in our post on COVID-19 testing, including our checklist of questions to understand testing data, information on geographical and temporal coverage, and detailed country-by-country source information. The testing dataset is updated around twice a week.

    Acknowledgements

    Our World in Data GitHub repository for covid-19.

    Inspiration

    All we love data, cause we love to go inside it and discover the truth that's the main inspiration I have.

  2. C

    Death Profiles by County

    • data.chhs.ca.gov
    • healthdata.gov
    csv
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). Death Profiles by County [Dataset]. https://data.chhs.ca.gov/dataset/death-profiles-by-county
    Explore at:
    csv(11738570), csv(15127221), csv(1128641), csv(60023260), csv(28125832), csv(74043128), csv(74351424), csv(74689382), csv(73906266), csv(52019564)Available download formats
    Dataset updated
    May 28, 2025
    Dataset authored and provided by
    California Department of Public Health
    Description

    This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

    The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

    The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.

  3. Johns Hopkins COVID-19 Case Tracker

    • data.world
    csv, zip
    Updated Jun 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jun 8, 2025
    Dataset provided by
    data.world, Inc.
    Authors
    The Associated Press
    Time period covered
    Jan 22, 2020 - Mar 9, 2023
    Area covered
    Description

    Updates

    • Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

    • April 9, 2020

      • The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.
    • April 20, 2020

      • Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.
    • April 29, 2020

      • The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.
    • September 1st, 2020

      • Johns Hopkins is now providing counts for the five New York City counties individually.
    • February 12, 2021

      • The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."
      • Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.
    • February 16, 2021

      - Johns Hopkins has reconciled Ohio's historical deaths data with the state.

      Overview

    The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

    The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

    This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

    The AP is updating this dataset hourly at 45 minutes past the hour.

    To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

    Queries

    Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

    Interactive

    The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

    @(https://datawrapper.dwcdn.net/nRyaf/15/)

    Interactive Embed Code

    <iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
    

    Caveats

    • This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.
    • In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.
    • In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"
    • This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.
    • Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
    • Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.
    • The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

    Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

    Attribution

    This data should be credited to Johns Hopkins University COVID-19 tracking project

  4. Data from: COVID-19 Deaths Dataset

    • kaggle.com
    zip
    Updated Jan 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dhruvil Dave (2022). COVID-19 Deaths Dataset [Dataset]. https://www.kaggle.com/dhruvildave/covid19-deaths-dataset
    Explore at:
    zip(28230945 bytes)Available download formats
    Dataset updated
    Jan 9, 2022
    Authors
    Dhruvil Dave
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is a daily updated dataset of COVID-19 deaths around the world. The dataset contains data of 45 countries. This data was collected from

    us-counties.csv contains data of the daily number of new cases and deaths, the seven-day rolling average and the seven-day rolling average per 100,000 residents of US at county level. The average reported is the seven day trailing average i.e. average of the day reported and six days prior.

    all_weekly_excess_deaths.csv collates detailed weekly breakdowns from official sources around the world.

    Image credits: Unsplash - schluditsch

    Let's pray for the ones who lost their lives fighting the battle and for the ones who risk their lives against this virus 🙏

  5. T

    CORONAVIRUS DEATHS by Country Dataset

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Mar 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2020). CORONAVIRUS DEATHS by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/coronavirus-deaths
    Explore at:
    csv, excel, xml, jsonAvailable download formats
    Dataset updated
    Mar 4, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    World
    Description

    This dataset provides values for CORONAVIRUS DEATHS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.

  6. z

    Counts of Influenza reported in UNITED STATES OF AMERICA: 1919-1951

    • zenodo.org
    • data.niaid.nih.gov
    json, xml, zip
    Updated Jun 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Counts of Influenza reported in UNITED STATES OF AMERICA: 1919-1951 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/us.6142004
    Explore at:
    json, xml, zipAvailable download formats
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    Project Tycho
    Authors
    Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 26, 1919 - Dec 8, 1951
    Area covered
    United States
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

    • Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.
    • Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  7. n

    Coronavirus (Covid-19) Data in the United States

    • nytimes.com
    • openicpsr.org
    • +4more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
    Explore at:
    Dataset provided by
    New York Times
    Description

    The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

    Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

    We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

    The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

  8. Coronavirus (COVID-19) In-depth Dataset

    • kaggle.com
    Updated May 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranjal Verma (2021). Coronavirus (COVID-19) In-depth Dataset [Dataset]. https://www.kaggle.com/pranjalverma08/coronavirus-covid19-indepth-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 29, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Pranjal Verma
    Description

    Context

    Covid-19 Data collected from various sources on the internet. This dataset has daily level information on the number of affected cases, deaths, and recovery from the 2019 novel coronavirus. Please note that this is time-series data and so the number of cases on any given day is the cumulative number.

    Content

    The dataset includes 28 files scrapped from various data sources mainly the John Hopkins GitHub repository, the ministry of health affairs India, worldometer, and Our World in Data website. The details of the files are as follows

    • countries-aggregated.csv A simple and cleaned data with 5 columns with self-explanatory names. -covid-19-daily-tests-vs-daily-new-confirmed-cases-per-million.csv A time-series data of daily test conducted v/s daily new confirmed case per million. Entity column represents Country name while code represents ISO code of the country. -covid-contact-tracing.csv Data depicting government policies adopted in case of contact tracing. 0 -> No tracing, 1-> limited tracing, 2-> Comprehensive tracing. -covid-stringency-index.csv The nine metrics used to calculate the Stringency Index are school closures; workplace closures; cancellation of public events; restrictions on public gatherings; closures of public transport; stay-at-home requirements; public information campaigns; restrictions on internal movements; and international travel controls. The index on any given day is calculated as the mean score of the nine metrics, each taking a value between 0 and 100. A higher score indicates a stricter response (i.e. 100 = strictest response). -covid-vaccination-doses-per-capita.csv A total number of vaccination doses administered per 100 people in the total population. This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses). -covid-vaccine-willingness-and-people-vaccinated-by-country.csv Survey who have not received a COVID vaccine and who are willing vs. unwilling vs. uncertain if they would get a vaccine this week if it was available to them. -covid_india.csv India specific data containing the total number of active cases, recovered and deaths statewide. -cumulative-deaths-and-cases-covid-19.csv A cumulative data containing death and daily confirmed cases in the world. -current-covid-patients-hospital.csv Time series data containing a count of covid patients hospitalized in a country -daily-tests-per-thousand-people-smoothed-7-day.csv Daily test conducted per 1000 people in a running week average. -face-covering-policies-covid.csv Countries are grouped into five categories: 1->No policy 2->Recommended 3->Required in some specified shared/public spaces outside the home with other people present, or some situations when social distancing not possible 4->Required in all shared/public spaces outside the home with other people present or all situations when social distancing not possible 5->Required outside the home at all times regardless of location or presence of other people -full-list-cumulative-total-tests-per-thousand-map.csv Full list of total tests conducted per 1000 people. -income-support-covid.csv Income support captures if the government is covering the salaries or providing direct cash payments, universal basic income, or similar, of people who lose their jobs or cannot work. 0->No income support, 1->covers less than 50% of lost salary, 2-> covers more than 50% of the lost salary. -internal-movement-covid.csv Showing government policies in restricting internal movements. Ranges from 0 to 2 where 2 represents the strictest. -international-travel-covid.csv Showing government policies in restricting international movements. Ranges from 0 to 2 where 2 represents the strictest. -people-fully-vaccinated-covid.csv Contains the count of fully vaccinated people in different countries. -people-vaccinated-covid.csv Contains the total count of vaccinated people in different countries. -positive-rate-daily-smoothed.csv Contains the positivity rate of various countries in a week running average. -public-gathering-rules-covid.csv Restrictions are given based on the size of public gatherings as follows: 0->No restrictions 1 ->Restrictions on very large gatherings (the limit is above 1000 people) 2 -> gatherings between 100-1000 people 3 -> gatherings between 10-100 people 4 -> gatherings of less than 10 people -school-closures-covid.csv School closure during Covid. -share-people-fully-vaccinated-covid.csv Share of people that are fully vaccinated. -stay-at-home-covid.csv Countries are grouped into four categories: 0->No measures 1->Recommended not to leave the house 2->Required to not leave the house with exceptions for daily exercise, grocery shopping, and ‘essent...
  9. Global Surface Summary of the Day - GSOD

    • catalog.data.gov
    • datadiscoverystudio.org
    • +2more
    Updated Oct 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DOC/NOAA/NESDIS/NCEI > National Centers for Environmental Information, NESDIS, NOAA, U.S. Department of Commerce (Point of Contact) (2023). Global Surface Summary of the Day - GSOD [Dataset]. https://catalog.data.gov/dataset/global-surface-summary-of-the-day-gsod1
    Explore at:
    Dataset updated
    Oct 11, 2023
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
    United States Department of Commercehttp://www.commerce.gov/
    National Environmental Satellite, Data, and Information Service
    Description

    Global Surface Summary of the Day is derived from The Integrated Surface Hourly (ISH) dataset. The ISH dataset includes global data obtained from the USAF Climatology Center, located in the Federal Climate Complex with NCDC. The latest daily summary data are normally available 1-2 days after the date-time of the observations used in the daily summaries. The online data files begin with 1929 and are at the time of this writing at the Version 8 software level. Over 9000 stations' data are typically available. The daily elements included in the dataset (as available from each station) are: Mean temperature (.1 Fahrenheit) Mean dew point (.1 Fahrenheit) Mean sea level pressure (.1 mb) Mean station pressure (.1 mb) Mean visibility (.1 miles) Mean wind speed (.1 knots) Maximum sustained wind speed (.1 knots) Maximum wind gust (.1 knots) Maximum temperature (.1 Fahrenheit) Minimum temperature (.1 Fahrenheit) Precipitation amount (.01 inches) Snow depth (.1 inches) Indicator for occurrence of: Fog, Rain or Drizzle, Snow or Ice Pellets, Hail, Thunder, Tornado/Funnel Cloud Global summary of day data for 18 surface meteorological elements are derived from the synoptic/hourly observations contained in USAF DATSAV3 Surface data and Federal Climate Complex Integrated Surface Hourly (ISH). Historical data are generally available for 1929 to the present, with data from 1973 to the present being the most complete. For some periods, one or more countries' data may not be available due to data restrictions or communications problems. In deriving the summary of day data, a minimum of 4 observations for the day must be present (allows for stations which report 4 synoptic observations/day). Since the data are converted to constant units (e.g, knots), slight rounding error from the originally reported values may occur (e.g, 9.9 instead of 10.0). The mean daily values described below are based on the hours of operation for the station. For some stations/countries, the visibility will sometimes 'cluster' around a value (such as 10 miles) due to the practice of not reporting visibilities greater than certain distances. The daily extremes and totals--maximum wind gust, precipitation amount, and snow depth--will only appear if the station reports the data sufficiently to provide a valid value. Therefore, these three elements will appear less frequently than other values. Also, these elements are derived from the stations' reports during the day, and may comprise a 24-hour period which includes a portion of the previous day. The data are reported and summarized based on Greenwich Mean Time (GMT, 0000Z - 2359Z) since the original synoptic/hourly data are reported and based on GMT.

  10. Average daily time spent on social media worldwide 2012-2024

    • statista.com
    • ai-chatbox.pro
    Updated Apr 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Average daily time spent on social media worldwide 2012-2024 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 10, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    How much time do people spend on social media? As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.

  11. z

    Counts of Tuberculosis reported in UNITED STATES OF AMERICA: 1890-2014

    • zenodo.org
    • data.niaid.nih.gov
    json, xml, zip
    Updated Jun 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Counts of Tuberculosis reported in UNITED STATES OF AMERICA: 1890-2014 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/us.56717001
    Explore at:
    xml, json, zipAvailable download formats
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    Project Tycho
    Authors
    Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 14, 1890 - Jun 28, 2014
    Area covered
    United States
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

    • Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.
    • Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  12. Data from: Global Roadkill Data: a dataset on terrestrial vertebrate...

    • figshare.com
    pdf
    Updated Apr 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clara Grilo; Tomé Neves; Jennifer Bates; Aliza le Roux; Pablo Medrano‐Vizcaíno; Mattia Quaranta; Inês Silva; KYLIE SOANES; Yun Wang; Sergio Damián Abate; Fernanda Delborgo Abra; Stuart Aldaz Cedeño; Pedro Rodrigues de Alencar; Mariana Fernada Peres de Almeida; Mario Henrique Alves; Paloma Alves; André Ambrozio de Assis; Rob Ament; Richard Andrášik; Edison Araguillin; Danielle Rodrigues de Araújo; Alexis Araujo-Quintero; Jesús Arca-Rubio; Morteza Arianejad; Carlos Armas; Erin Arnold; Fernando Ascensão; Badrul Azhar; Seung-Yun Baek (2025). Global Roadkill Data: a dataset on terrestrial vertebrate mortality caused by collision with vehicles [Dataset]. http://doi.org/10.6084/m9.figshare.25714233.v5
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 3, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Clara Grilo; Tomé Neves; Jennifer Bates; Aliza le Roux; Pablo Medrano‐Vizcaíno; Mattia Quaranta; Inês Silva; KYLIE SOANES; Yun Wang; Sergio Damián Abate; Fernanda Delborgo Abra; Stuart Aldaz Cedeño; Pedro Rodrigues de Alencar; Mariana Fernada Peres de Almeida; Mario Henrique Alves; Paloma Alves; André Ambrozio de Assis; Rob Ament; Richard Andrášik; Edison Araguillin; Danielle Rodrigues de Araújo; Alexis Araujo-Quintero; Jesús Arca-Rubio; Morteza Arianejad; Carlos Armas; Erin Arnold; Fernando Ascensão; Badrul Azhar; Seung-Yun Baek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present the GLOBAL ROADKILL DATA, the largest worldwide compilation of roadkill data on terrestrial vertebrates. We outline the workflow (Fig. 1) to illustrate the sequential steps of the study, in which we merged local-scale survey datasets and opportunistic records into a unified roadkill large dataset comprising 208,570 roadkill records. These records include 2283 species and subspecies from 54 countries across six continents, ranging from 1971 to 2024.Large roadkill datasets offer the advantage ofpreventing the collection of redundant data and are valuable resources for both local and macro-scale analyses regarding roadkill rates, road and landscape features associated with roadkill risk, species more vulnerable to road traffic, and populations at risk due to additional mortality. The standardization of data - such as scientific names, projection coordinates, and units - in a user-friendly format, makes themreadily accessible to a broader scientific and non-scientific community, including NGOs, consultants, public administration officials, and road managers. The open-access approach promotes collaboration among researchers and road practitioners, facilitating the replication of studies, validation of findings, and expansion of previous work. Moreover, researchers can utilize suchdatasets to develop new hypotheses, conduct meta-analyses, address pressing challenges more efficiently and strengthen the robustness of road ecology research. Ensuring widespreadaccess to roadkill data fosters a more diverse and inclusive research community. This not only grants researchers in emerging economies with more data for analysis, but also cultivates a diverse array of perspectives and insightspromoting the advance of infrastructure ecology.MethodsInformation sources: A core team from different continents performed a systematic literature search in Web of Science and Google Scholar for published peer-reviewed papers and dissertations. It was searched for the following terms: “roadkill* OR “road-kill” OR “road mortality” AND (country) in English, Portuguese, Spanish, French and/or Mandarin. This initiative was also disseminated to the mailing lists associated with transport infrastructure: The CCSG Transport Working Group (WTG), Infrastructure & Ecology Network Europe (IENE) and Latin American & Caribbean Transport Working Group (LACTWG) (Fig. 1). The core team identified 750 scientific papers and dissertations with information on roadkill and contacted the first authors of the publications to request georeferenced locations of roadkill andofferco-authorship to this data paper. Of the 824 authors contacted, 145agreed to sharegeoreferenced roadkill locations, often involving additional colleagues who contributed to data collection. Since our main goal was to provide open access to data that had never been shared in this format before, data from citizen science projects (e.g., globalroakill.net) that are already available were not included.Data compilation: A total of 423 co-authors compiled the following information: continent, country, latitude and longitude in WGS 84 decimal degrees of the roadkill, coordinates uncertainty, class, order, family, scientific name of the roadkill, vernacular name, IUCN status, number of roadkill, year, month, and day of the record, identification of the road, type of road, survey type, references, and observers that recorded the roadkill (Supplementary Information Table S1 - description of the fields and Table S2 - reference list). When roadkill data were derived from systematic surveys, the dataset included additional information on road length that was surveyed, latitude and longitude of the road (initial and final part of the road segment), survey period, start year of the survey, final year of the survey, 1st month of the year surveyed, last month of the year surveyed, and frequency of the survey. We consolidated 142 valid datasets into a single dataset. We complemented this data with OccurenceID (a UUID generated using Java code), basisOfRecord, countryCode, locality using OpenStreetMap’s API (https://www.openstreetmap.org), geodeticDatum, verbatimScientificName, Kingdom, phylum, genus, specificEpithet, infraspecificEpithet, acceptedNameUsage, scientific name authorship, matchType, taxonRank using Darwin Core Reference Guide (https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters) and link of the associatedReference (URL).Data standardization - We conducted a clustering analysis on all text fields to identify similar entries with minor variations, such as typos, and corrected them using OpenRefine (http://openrefine.org). Wealsostandardized all date values using OpenRefine. Coordinate uncertainties listed as 0 m were adjusted to either 30m or 100m, depending on whether they were recorded after or before 2000, respectively, following the recommendation in the Darwin Core Reference Guide (https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters).Taxonomy - We cross-referenced all species names with the Global Biodiversity Information Facility (GBIF) Backbone Taxonomy using Java and GBIF’s API (https://doi.org/10.15468/39omei). This process aimed to rectify classification errors, include additional fields such as Kingdom, Phylum, and scientific authorship, and gather comprehensive taxonomic information to address any gap withinthe datasets. For species not automatically matched (matchType - Table S1), we manually searched for correct synonyms when available.Species conservation status - Using the species names, we retrieved their conservation status and also vernacular names by cross-referencing with the database downloaded from the IUCNRed List of Threatened Species (https://www.iucnredlist.org). Species without a match were categorized as "Not Evaluated".Data RecordsGLOBAL ROADKILL DATA is available at Figshare27 https://doi.org/10.6084/m9.figshare.25714233. The dataset incorporates opportunistic (collected incidentally without data collection efforts) and systematic data (collected through planned, structured, and controlled methods designed to ensure consistency and reliability). In total, it comprises 208,570 roadkill records across 177,428 different locations(Fig. 2). Data were collected from the road network of 54 countries from 6 continents: Europe (n = 19), Asia (n = 16), South America (n=7), North America (n = 4), Africa (n = 6) and Oceania (n = 2).(Figure 2 goes here)All data are georeferenced in WGS84 decimals with maximum uncertainty of 5000 m. Approximately 92% of records have a location uncertainty of 30 m or less, with only 1138 records having location uncertainties ranging from 1000 to 5000 m. Mammals have the highest number of roadkill records (61%), followed by amphibians (21%), reptiles (10%) and birds (8%). The species with the highest number of records were roe deer (Capreolus capreolus, n = 44,268), pool frog (Pelophylax lessonae, n = 11,999) and European fallow deer (Dama dama, n = 7,426).We collected information on 126 threatened species with a total of 4570 records. Among the threatened species, the giant anteater (Myrmecophaga tridactyla, VULNERABLE) has the highest number of records n = 1199), followed by the common fire salamander (Salamandra salamandra, VULNERABLE, n=1043), and European rabbit (Oryctolagus cuniculus, ENDANGERED, n = 440). Records ranged from 1971 and 2024, comprising 72% of the roadkill recorded since 2013. Over 46% of the records were obtained from systematic surveys, with road length and survey period averaging, respectively, 66 km (min-max: 0.09-855 km) and 780 days (1-25,720 days).Technical ValidationWe employed the OpenStreetMap API through Java todetect location inaccuracies, andvalidate whether the geographic coordinates aligned with the specified country. We calculated the distance of each occurrence to the nearest road using the GRIP global roads database28, ensuring that all records were within the defined coordinate uncertainty. We verified if the survey duration matched the provided initial and final survey dates. We calculated the distance between the provided initial and final road coordinates and cross-checked it with the given road length. We identified and merged duplicate entries within the same dataset (same location, species, and date), aggregating the number of roadkills for each occurrence.Usage NotesThe GLOBAL ROADKILL DATA is a compilation of roadkill records and was designed to serve as a valuable resource for a wide range of analyses. Nevertheless, to prevent the generation of meaningless results, users should be aware of the followinglimitations:- Geographic representation – There is an evident bias in the distribution of records. Data originatedpredominantly from Europe (60% of records), South America (22%), and North America (12%). Conversely, there is a notable lack of records from Asia (5%), Oceania (1%) and Africa (0.3%). This dataset represents 36% of the initial contacts that provided geo-referenced records, which may not necessarily correspond to locations where high-impact roads are present.- Location accuracy - Insufficient location accuracy was observed for 1% of the data (ranging from 1000 to 5000 m), that was associated with various factors, such as survey methods, recording practices, or timing of the survey.- Sampling effort - This dataset comprised both opportunistic data and records from systematic surveys, with a high variability in survey duration and frequency. As a result, the use of both opportunistic and systematic surveys may affect the relative abundance of roadkill making it hard to make sound comparisons among species or areas.- Detectability and carcass removal bias - Although several studies had a high frequency of road surveys,the duration of carcass persistence on roads may vary with species size and environmental conditions, affecting detectability. Accordingly, several approaches account for survey frequency and target speciesto estimate more

  13. P

    Dollar Street Dataset Dataset

    • paperswithcode.com
    Updated Nov 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Dollar Street Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/dollar-street-dataset
    Explore at:
    Dataset updated
    Nov 8, 2023
    Description

    The MLCommons Dollar Street Dataset is a collection of images of everyday household items from homes around the world that visually captures socioeconomic diversity of traditionally underrepresented populations. It consists of public domain data, licensed for academic, commercial and non-commercial usage, under CC-BY and CC-BY-SA 4.0. The dataset was developed because similar datasets lack socioeconomic metadata and are not representative of global diversity.

    It includes 38,479 images collected from 63 different countries, tagged from a set of 289 possible topics. Besides this, the metadata for each image includes demographic information such as region, country, and total household monthly income, allowing for many different use cases, ultimately enhancing image datasets for computer vision.

    The dataset was introduced by the paper: The Dollar Street Dataset: Images Representing the Geographic and Socioeconomic Diversity of the World

  14. ERA5 post-processed daily statistics on single levels from 1940 to present

    • cds.climate.copernicus.eu
    grib
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECMWF (2025). ERA5 post-processed daily statistics on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.4991cf48
    Explore at:
    gribAvailable download formats
    Dataset updated
    Jun 25, 2025
    Dataset provided by
    European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
    Authors
    ECMWF
    License

    https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdf

    Time period covered
    Jan 1, 1940 - Jun 19, 2025
    Description

    ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. This catalogue entry provides post-processed ERA5 hourly single-level data aggregated to daily time steps. In addition to the data selection options found on the hourly page, the following options can be selected for the daily statistic calculation:

    The daily aggregation statistic (daily mean, daily max, daily min, daily sum*) The sub-daily frequency sampling of the original data (1 hour, 3 hours, 6 hours) The option to shift to any local time zone in UTC (no shift means the statistic is computed from UTC+00:00)

    *The daily sum is only available for the accumulated variables (see ERA5 documentation for more details). Users should be aware that the daily aggregation is calculated during the retrieval process and is not part of a permanently archived dataset. For more details on how the daily statistics are calculated, including demonstrative code, please see the documentation. For more details on the hourly data used to calculate the daily statistics, please refer to the ERA5 hourly single-level data catalogue entry and the documentation found therein.

  15. z

    Counts of Cholera reported in UNITED STATES OF AMERICA: 1895-1905

    • zenodo.org
    json, xml, zip
    Updated Jun 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Counts of Cholera reported in UNITED STATES OF AMERICA: 1895-1905 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/us.63650001
    Explore at:
    json, zip, xmlAvailable download formats
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    Project Tycho
    Authors
    Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 29, 1895 - Nov 25, 1905
    Area covered
    United States
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

    • Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.
    • Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  16. Mass Killings in America, 2006 - present

    • data.world
    csv, zip
    Updated Jun 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2025). Mass Killings in America, 2006 - present [Dataset]. https://data.world/associatedpress/mass-killings-public
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jun 7, 2025
    Dataset provided by
    data.world, Inc.
    Authors
    The Associated Press
    Time period covered
    Jan 1, 2006 - Apr 29, 2025
    Area covered
    Description

    THIS DATASET WAS LAST UPDATED AT 2:11 AM EASTERN ON JUNE 7

    OVERVIEW

    2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.

    In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.

    A total of 229 people died in mass killings in 2019.

    The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.

    One-third of the offenders died at the scene of the killing or soon after, half from suicides.

    About this Dataset

    The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.

    The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.

    This data will be updated periodically and can be used as an ongoing resource to help cover these events.

    Using this Dataset

    To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:

    Mass killings by year

    Mass shootings by year

    To get these counts just for your state:

    Filter killings by state

    Definition of "mass murder"

    Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.

    This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”

    Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.

    Methodology

    Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.

    Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.

    In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.

    Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.

    Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.

    This project started at USA TODAY in 2012.

    Contacts

    Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.

  17. TES/Aura L3 Atmospheric Temperatures Daily Gridded V005 - Dataset - NASA...

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nasa.gov (2025). TES/Aura L3 Atmospheric Temperatures Daily Gridded V005 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/tes-aura-l3-atmospheric-temperatures-daily-gridded-v005-a48e6
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    TL3ATD_5 is the Tropospheric Emission Spectrometer (TES)/Aura Level 2 Atmospheric Temperatures Daily Gridded Version 5 data product. TES was an instrument aboard NASA's Aura satellite and was launched from California on July 15, 2004. Data collection for TES is complete. This product consists of daily atmospheric temperature and volume mixing ratio (VMR) for the atmospheric species, which were provided at 2 degree latitude X 4 degree longitude spatial grids and at a subset of TES standard pressure levels. The TES Science Data Processing L3 subsystem interpolated L2 atmospheric profiles collected in a Global Survey onto a global grid uniform in latitude and longitude to provide a 3-D representation of the distribution of atmospheric gasses. Daily and monthly averages of L2 profiles and browse images are available. The L3 standard data products are composed of L3 HDF-EOS grid data. A separate product file was produced for each different atmospheric species. TES obtained data in two basic observation modes: Limb or Nadir; Nadir observations, which point directly to the surface of the Earth, are different from limb observations, which are pointed at various off-nadir angles into the atmosphere. The product file may contain, in separate folders, limb data, nadir data, or both folders may be present. Specific to L3 processing were the terms Daily and Monthly representing the approximate time coverage of the L3 products. However, the input data granules to the L3 process are complete Global Surveys; in other words a Global Survey was not split in relation to time when input to the L3 processes even if they exceed the usual understood meanings of a day or month. More specifically, Daily L3 products represented a single Global Survey (approximately 26 hours) and Monthly L3 products represent Global Surveys that are initiated within that calendar month. The data granules defined for L3 standard products were daily and monthly.

  18. Ceph Drive Telemetry Data

    • kaggle.com
    Updated Apr 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karanraj Chauhan (2022). Ceph Drive Telemetry Data [Dataset]. https://www.kaggle.com/datasets/chauhankaranraj/ceph-drive-telemetry-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 4, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Karanraj Chauhan
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    Motivation

    Storage system failure is one of the main reasons behind data center failures. It can promptly lead to service degradation, and ultimately result in poor consumer experience, loss of revenue, and in the worst case, loss of important data.

    So then this begs the question, can we predict such failures well in advance, so that we can mitigate the effects? This is an actively explored area of research in both industry and academia. However, the datasets used are usually not made publicly available, thus creating a barrier to entry for new ideas. Additionally, datasets that are publicly available often contain data from very few types of workloads, thus making them difficult to use across different domains.

    To address this issue, the Ceph team started collecting and publishing the Ceph telemetry dataset. This dataset consists of disk usage data collected from Ceph storage clusters set up in various types of environments, and running a wide variety of workloads. Moreover, since Ceph is widely used by Kubernetes clusters for storage, this dataset is likely to contain storage usage patterns of containerized applications, which is an extremely important workload for most organizations today.

    Content

    This dataset contains raw SMART jsons collected from disks approximately once per day, along with the timestamp and the storage cluster ID the disk belongs to. The data has been collected from various device types, interfaces, vendors, and models, from various deployment types (JBOD, HW RAID, VMs). The Ceph team has also created some publicly available dashboards to visualize the salient features of this data.

    This is a community-driven and a continuous effort, so if you have suggestions for what other metrics should be collected for more effective analysis and prediction models, please feel free to reach out to the Ceph team!

    Contributing

    • As a data scientist, you can

      • perform exploratory data analysis to derive insights into disk and cluster health
      • develop models for predicting disk failures
      • contribute to our open source python library containing pretrained models that are used in Ceph
    • As a subject matter expert, you can provide your invaluable feedback and suggestions

    • As a Ceph user, you can enable telemetry and opt-in to contribute anonymous, non-identifying data from your storage clusters

    License

    This data is under Community Data License Agreement – Sharing – Version 1.0. By using this data you agree to its license.

    Contact Us

    Please feel free to reach out to us with any suggestions on how this data can be used, what metrics should be included (or removed) in the data, any interesting findings, or any questions you may have, here: telemetry@ceph.io.

  19. H08 Deep Blue Level 3 daily aerosol data, 1x1 degree grid

    • data.nasa.gov
    • gimi9.com
    • +3more
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nasa.gov (2025). H08 Deep Blue Level 3 daily aerosol data, 1x1 degree grid [Dataset]. https://data.nasa.gov/dataset/h08-deep-blue-level-3-daily-aerosol-data-1x1-degree-grid-634f2
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The H08 Deep Blue Level 3 daily aerosol data, 1x1 degree grid product, short-name AERDB_D3_AHI_H08, derived from the L2 (AERDB_L2_AHI_H08) input data, each D3 AHI/Himawari-8 product is produced daily at 1 x 1-degree horizontal resolution. In general, in this daily L3 (identified in the short-name as D3) aggregated product, each data field represents the arithmetic mean of all cells whose latitude and longitude places them within the bounds of each grid element. Another statistic like standard deviation is also provided in some cases. The final retrievals used in the aggregation process are Quality Assurance (QA)-filtered best-estimate values for cells that are measured on the day of interest. Further, at least three such retrievals are required to render the validity of a grid cell on any given day. This first release of these products spans from May 2019 through April 2020 with a potential to generate additional temporal coverage in the future. The Level-3 (L3) Advanced Himawari Imager (AHI) Himawari-8 Deep Blue Daily Aerosol dataset is part of a 12-product suite produced by an Earth Science Research from Operational Geostationary Satellite Systems (ESROGSS)-funded project. The 12 products in this project include nine derived from three Geostationary Earth Observation (GEO) instruments and three from merged data from GEO and Low-Earth Orbit (LEO) instruments.The AERDB_D3_AHI_H08 product, in netCDF4 format, contains 48 Science Data Set (SDS) layers. For more information consult LAADS product description page at:https://ladsweb.modaps.eosdis.nasa.gov/missions-and-measurements/products/AERDB_D3_AHI_H08Or, Deep Blue aerosol project webpage at: https://earth.gsfc.nasa.gov/climate/data/deep-blue

  20. d

    High Resolution Daily Global Alfalfa-Reference Potential Evapotranspiration...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). High Resolution Daily Global Alfalfa-Reference Potential Evapotranspiration Climatology [Dataset]. https://catalog.data.gov/dataset/high-resolution-daily-global-alfalfa-reference-potential-evapotranspiration-climatology
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    Global alfalfa-reference potential evapotranspiration (ETr) is a key model parameter in actual evapotranspiration (ETa) modeling for worldwide applications. This dataset was constructed for use with the Operational Simplified Surface Energy Balance (SSEBop) model as a key driver of the final ETa magnitude. SSEBop is a parametric energy balance-based model that determines actual ET as the product of two independent estimates: 1) the SSEBop modeled ET fraction (ETf), an index nominally varying between 0 and 1 and derived from observed Landsat surface temperature using satellite psychrometry, and 2) the potential ET (maximum) under environmental conditions for an alfalfa crop (in millimeters). As SSEBop ETf can now be modeled for any Landsat scene across the globe, a suitable global ETr climatology dataset needed to be created. This global ETr data is a fusion of several different remote sensing and modeling products: 1981-2010 climatological normal (daily mean) ETr from Gridmet over the continental United States and 1981-2010 climatological normal MERRA-2 Fine Resolution ETr for all areas outside of the continental United States that has been scaled and corrected via terrestrial ecoregions from OneEarth and scaled using Worldclim Version 3 ETo (Abatzoglou 2013; Dinerstein et al., 2017; Hobbins et al., 2022; Zomer et al., 2022). The final mosaic has been smoothed and resampled to 1-km spatial resolution. The final dataset is a daily dataset of 366 GeoTIFF raster files for each day of the year including the leap day and representing a climatological normal (1981-2010) alfalfa-reference potential ET (ETr) for the entire global extent.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
C-3PO (2021). The World Dataset of COVID-19 [Dataset]. https://www.kaggle.com/datasets/aditeloo/the-world-dataset-of-covid19/code
Organization logo

The World Dataset of COVID-19

Data on COVID-19 (coronavirus) by Our World in Data

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 25, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
C-3PO
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered
World
Description

Context

These datasets are from Our World in Data. Their complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data. It is updated daily and includes data on confirmed cases, deaths, hospitalizations, testing, and vaccinations as well as other variables of potential interest.

Content

Confirmed cases and deaths:

our data comes from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). We discuss how and when JHU collects and publishes this data. The cases & deaths dataset is updated daily. Note: the number of cases or deaths reported by any institution—including JHU, the WHO, the ECDC, and others—on a given day does not necessarily represent the actual number on that date. This is because of the long reporting chain that exists between a new case/death and its inclusion in statistics. This also means that negative values in cases and deaths can sometimes appear when a country corrects historical data because it had previously overestimated the number of cases/deaths. Alternatively, large changes can sometimes (although rarely) be made to a country's entire time series if JHU decides (and has access to the necessary data) to correct values retrospectively.

Hospitalizations and intensive care unit (ICU) admissions:

our data comes from the European Centre for Disease Prevention and Control (ECDC) for a select number of European countries; the government of the United Kingdom; the Department of Health & Human Services for the United States; the COVID-19 Tracker for Canada. Unfortunately, we are unable to provide data on hospitalizations for other countries: there is currently no global, aggregated database on COVID-19 hospitalization, and our team at Our World in Data does not have the capacity to build such a dataset.

Testing for COVID-19:

this data is collected by the Our World in Data team from official reports; you can find further details in our post on COVID-19 testing, including our checklist of questions to understand testing data, information on geographical and temporal coverage, and detailed country-by-country source information. The testing dataset is updated around twice a week.

Acknowledgements

Our World in Data GitHub repository for covid-19.

Inspiration

All we love data, cause we love to go inside it and discover the truth that's the main inspiration I have.

Search
Clear search
Close search
Google apps
Main menu