23 datasets found

The World Dataset of COVID-19
kaggle.com
Updated May 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
C-3PO (2021). The World Dataset of COVID-19 [Dataset]. https://www.kaggle.com/datasets/aditeloo/the-world-dataset-of-covid19/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 25, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
C-3PO
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
World
Description
Context

These datasets are from Our World in Data. Their complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data. It is updated daily and includes data on confirmed cases, deaths, hospitalizations, testing, and vaccinations as well as other variables of potential interest.

Content

Confirmed cases and deaths:

our data comes from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). We discuss how and when JHU collects and publishes this data. The cases & deaths dataset is updated daily. Note: the number of cases or deaths reported by any institution—including JHU, the WHO, the ECDC, and others—on a given day does not necessarily represent the actual number on that date. This is because of the long reporting chain that exists between a new case/death and its inclusion in statistics. This also means that negative values in cases and deaths can sometimes appear when a country corrects historical data because it had previously overestimated the number of cases/deaths. Alternatively, large changes can sometimes (although rarely) be made to a country's entire time series if JHU decides (and has access to the necessary data) to correct values retrospectively.

Hospitalizations and intensive care unit (ICU) admissions:

our data comes from the European Centre for Disease Prevention and Control (ECDC) for a select number of European countries; the government of the United Kingdom; the Department of Health & Human Services for the United States; the COVID-19 Tracker for Canada. Unfortunately, we are unable to provide data on hospitalizations for other countries: there is currently no global, aggregated database on COVID-19 hospitalization, and our team at Our World in Data does not have the capacity to build such a dataset.

Testing for COVID-19:

this data is collected by the Our World in Data team from official reports; you can find further details in our post on COVID-19 testing, including our checklist of questions to understand testing data, information on geographical and temporal coverage, and detailed country-by-country source information. The testing dataset is updated around twice a week.

Acknowledgements

Our World in Data GitHub repository for covid-19.

Inspiration

All we love data, cause we love to go inside it and discover the truth that's the main inspiration I have.
Johns Hopkins COVID-19 Case Tracker
data.world
csv, zip
Updated Jun 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
Explore at:
zip, csvAvailable download formats
Dataset updated
Jun 8, 2025
Dataset provided by
data.world, Inc.
Authors
The Associated Press
Time period covered
Jan 22, 2020 - Mar 9, 2023
Area covered
Description
Updates

Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

CDC Weekly case and death counts (national and state level)

CDC County level cases and deaths

HHS New hospital admissions

CDC NowCast COVID variant proportions (national and regional level)

April 9, 2020

The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.

April 20, 2020

Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.

April 29, 2020

The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.

September 1st, 2020

Johns Hopkins is now providing counts for the five New York City counties individually.

February 12, 2021

The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."

Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.

February 16, 2021

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

The AP is updating this dataset hourly at 45 minutes past the hour.

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

Queries

Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

Filter cases by state here

Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac

Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true

Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.

Pull the 100 counties with the highest per-capita confirmed cases here

Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.

Interactive

The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

@(https://datawrapper.dwcdn.net/nRyaf/15/)

Interactive Embed Code

<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>

Caveats

This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.

In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.

In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"

This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.

Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.

The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

Attribution

This data should be credited to Johns Hopkins University COVID-19 tracking project
Coronavirus (COVID-19) In-depth Dataset
kaggle.com
Updated May 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranjal Verma (2021). Coronavirus (COVID-19) In-depth Dataset [Dataset]. https://www.kaggle.com/pranjalverma08/coronavirus-covid19-indepth-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 29, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Pranjal Verma
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

Covid-19 Data collected from various sources on the internet. This dataset has daily level information on the number of affected cases, deaths, and recovery from the 2019 novel coronavirus. Please note that this is time-series data and so the number of cases on any given day is the cumulative number.

Content

The dataset includes 28 files scrapped from various data sources mainly the John Hopkins GitHub repository, the ministry of health affairs India, worldometer, and Our World in Data website. The details of the files are as follows

countries-aggregated.csv A simple and cleaned data with 5 columns with self-explanatory names. -covid-19-daily-tests-vs-daily-new-confirmed-cases-per-million.csv A time-series data of daily test conducted v/s daily new confirmed case per million. Entity column represents Country name while code represents ISO code of the country. -covid-contact-tracing.csv Data depicting government policies adopted in case of contact tracing. 0 -> No tracing, 1-> limited tracing, 2-> Comprehensive tracing. -covid-stringency-index.csv The nine metrics used to calculate the Stringency Index are school closures; workplace closures; cancellation of public events; restrictions on public gatherings; closures of public transport; stay-at-home requirements; public information campaigns; restrictions on internal movements; and international travel controls. The index on any given day is calculated as the mean score of the nine metrics, each taking a value between 0 and 100. A higher score indicates a stricter response (i.e. 100 = strictest response). -covid-vaccination-doses-per-capita.csv A total number of vaccination doses administered per 100 people in the total population. This is counted as a single dose, and may not equal the total number of people vaccinated, depending on the specific dose regime (e.g. people receive multiple doses). -covid-vaccine-willingness-and-people-vaccinated-by-country.csv Survey who have not received a COVID vaccine and who are willing vs. unwilling vs. uncertain if they would get a vaccine this week if it was available to them. -covid_india.csv India specific data containing the total number of active cases, recovered and deaths statewide. -cumulative-deaths-and-cases-covid-19.csv A cumulative data containing death and daily confirmed cases in the world. -current-covid-patients-hospital.csv Time series data containing a count of covid patients hospitalized in a country -daily-tests-per-thousand-people-smoothed-7-day.csv Daily test conducted per 1000 people in a running week average. -face-covering-policies-covid.csv Countries are grouped into five categories: 1->No policy 2->Recommended 3->Required in some specified shared/public spaces outside the home with other people present, or some situations when social distancing not possible 4->Required in all shared/public spaces outside the home with other people present or all situations when social distancing not possible 5->Required outside the home at all times regardless of location or presence of other people -full-list-cumulative-total-tests-per-thousand-map.csv Full list of total tests conducted per 1000 people. -income-support-covid.csv Income support captures if the government is covering the salaries or providing direct cash payments, universal basic income, or similar, of people who lose their jobs or cannot work. 0->No income support, 1->covers less than 50% of lost salary, 2-> covers more than 50% of the lost salary. -internal-movement-covid.csv Showing government policies in restricting internal movements. Ranges from 0 to 2 where 2 represents the strictest. -international-travel-covid.csv Showing government policies in restricting international movements. Ranges from 0 to 2 where 2 represents the strictest. -people-fully-vaccinated-covid.csv Contains the count of fully vaccinated people in different countries. -people-vaccinated-covid.csv Contains the total count of vaccinated people in different countries. -positive-rate-daily-smoothed.csv Contains the positivity rate of various countries in a week running average. -public-gathering-rules-covid.csv Restrictions are given based on the size of public gatherings as follows: 0->No restrictions 1 ->Restrictions on very large gatherings (the limit is above 1000 people) 2 -> gatherings between 100-1000 people 3 -> gatherings between 10-100 people 4 -> gatherings of less than 10 people -school-closures-covid.csv School closure during Covid. -share-people-fully-vaccinated-covid.csv Share of people that are fully vaccinated. -stay-at-home-covid.csv Countries are grouped into four categories: 0->No measures 1->Recommended not to leave the house 2->Required to not leave the house with exceptions for daily exercise, grocery shopping, and ‘essent...
Public Health Official Departures
data.world
csv, zip
Updated Jun 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2022). Public Health Official Departures [Dataset]. https://data.world/associatedpress/public-health-official-departures
Explore at:
csv, zipAvailable download formats
Dataset updated
Jun 7, 2022
Dataset provided by
data.world, Inc.
Authors
The Associated Press
Description
Changelog:

Update September 20, 2021: Data and overview updated to reflect data used in the September 15 story Over Half of States Have Rolled Back Public Health Powers in Pandemic. It includes 303 state or local public health leaders who resigned, retired or were fired between April 1, 2020 and Sept. 12, 2021. Previous versions of this dataset reflected data used in the Dec. 2020 and April 2021 stories.

Overview

Across the U.S., state and local public health officials have found themselves at the center of a political storm as they combat the worst pandemic in a century. Amid a fractured federal response, the usually invisible army of workers charged with preventing the spread of infectious disease has become a public punching bag.

In the midst of the coronavirus pandemic, at least 303 state or local public health leaders in 41 states have resigned, retired or been fired since April 1, 2020, according to an ongoing investigation by The Associated Press and KHN.

According to experts, that is the largest exodus of public health leaders in American history.

Many left due to political blowback or pandemic pressure, as they became the target of groups that have coalesced around a common goal — fighting and even threatening officials over mask orders and well-established public health activities like quarantines and contact tracing. Some left to take higher profile positions, or due to health concerns. Others were fired for poor performance. Dozens retired. An untold number of lower level staffers have also left.

The result is a further erosion of the nation’s already fragile public health infrastructure, which KHN and the AP documented beginning in 2020 in the Underfunded and Under Threat project.

Findings

The AP and KHN found that:

One in five Americans live in a community that has lost its local public health department leader during the pandemic

Top public health officials in 28 states have left state-level departments ## Using this data To filter for data specific to your state, use this query

To get total numbers of exits by state, broken down by state and local departments, use this query

Methodology

KHN and AP counted how many state and local public health leaders have left their jobs between April 1, 2020 and Sept. 12, 2021.

The government tasks public health workers with improving the health of the general population, through their work to encourage healthy living and prevent infectious disease. To that end, public health officials do everything from inspecting water and food safety to testing the nation’s babies for metabolic diseases and contact tracing cases of syphilis.

Many parts of the country have a health officer and a health director/administrator by statute. The analysis counted both of those positions if they existed. For state-level departments, the count tracks people in the top and second-highest-ranking job.

The analysis includes exits of top department officials regardless of reason, because no matter the reason, each left a vacancy at the top of a health agency during the pandemic. Reasons for departures include political pressure, health concerns and poor performance. Others left to take higher profile positions or to retire. Some departments had multiple top officials exit over the course of the pandemic; each is included in the analysis.

Reporters compiled the exit list by reaching out to public health associations and experts in every state and interviewing hundreds of public health employees. They also received information from the National Association of City and County Health Officials, and combed news reports and records.

Public health departments can be found at multiple levels of government. Each state has a department that handles these tasks, but most states also have local departments that either operate under local or state control. The population served by each local health department is calculated using the U.S. Census Bureau 2019 Population Estimates based on each department’s jurisdiction.

KHN and the AP have worked since the spring on a series of stories documenting the funding, staffing and problems around public health. A previous data distribution detailed a decade's worth of cuts to state and local spending and staffing on public health. That data can be found here.

Attribution

Findings and the data should be cited as: "According to a KHN and Associated Press report."

Is Data Missing?

If you know of a public health official in your state or area who has left that position between April 1, 2020 and Sept. 12, 2021 and isn't currently in our dataset, please contact authors Anna Maria Barry-Jester annab@kff.org, Hannah Recht hrecht@kff.org, Michelle Smith mrsmith@ap.org and Lauren Weber laurenw@kff.org.
n
Coronavirus (Covid-19) Data in the United States
nytimes.com
openicpsr.org
+4more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Explore at:
Dataset provided by
New York Times
Description
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
d
Data from: Illustrating potential effects of alternate control populations...
search.dataone.org
datadryad.org
+1more
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yidi Huang; William Yuan; Isaac Kohane; Brett Beaulieu-Jones (2025). Illustrating potential effects of alternate control populations on real-world evidence-based statistical analyses [Dataset]. http://doi.org/10.5061/dryad.905qfttks
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.905qfttks
Dataset updated
May 7, 2025
Dataset provided by
Dryad Digital Repository
Authors
Yidi Huang; William Yuan; Isaac Kohane; Brett Beaulieu-Jones
Time period covered
Jun 3, 2021
Description
Objective: Case-control study designs are commonly used in retrospective analyses of Real-World Evidence (RWE). Due to the increasingly wide availability of RWE, it can be difficult to determine whether findings are robust or the result of testing multiple hypotheses.

Materials and Methods: We investigate the potential effects of modifying cohort definitions in a case-control association study between depression and Type 2 Diabetes Mellitus (T2D). We used a large (>75 million individuals) de-identified administrative claims database to observe the effects of minor changes to the requirements of glucose and hemoglobin A1c tests in the control group.

Results: We found that small permutations to the criteria used to define the control population result in significant shifts in both the demographic structure of the identified cohort as well as the odds ratio of association. These differences remain present when testing against age and sex-matched controls.

Discussion: Analyses o...
f
Description of the real-world dataset.
figshare.com
plos.figshare.com
xls
Updated Jun 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fadi K. Dib; Peter Rodgers (2023). Description of the real-world dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0287744.t010
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0287744.t010
Dataset updated
Jun 27, 2023
Dataset provided by
PLOS ONE
Authors
Fadi K. Dib; Peter Rodgers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Graph drawing, involving the automatic layout of graphs, is vital for clear data visualization and interpretation but poses challenges due to the optimization of a multi-metric objective function, an area where current search-based methods seek improvement. In this paper, we investigate the performance of Jaya algorithm for automatic graph layout with straight lines. Jaya algorithm has not been previously used in the field of graph drawing. Unlike most population-based methods, Jaya algorithm is a parameter-less algorithm in that it requires no algorithm-specific control parameters and only population size and number of iterations need to be specified, which makes it easy for researchers to apply in the field. To improve Jaya algorithm’s performance, we applied Latin Hypercube Sampling to initialize the population of individuals so that they widely cover the search space. We developed a visualization tool that simplifies the integration of search methods, allowing for easy performance testing of algorithms on graphs with weighted aesthetic metrics. We benchmarked the Jaya algorithm and its enhanced version against Hill Climbing and Simulated Annealing, commonly used graph-drawing search algorithms which have a limited number of parameters, to demonstrate Jaya algorithm’s effectiveness in the field. We conducted experiments on synthetic datasets with varying numbers of nodes and edges using the Erdős–Rényi model and real-world graph datasets and evaluated the quality of the generated layouts, and the performance of the methods based on number of function evaluations. We also conducted a scalability experiment on Jaya algorithm to evaluate its ability to handle large-scale graphs. Our results showed that Jaya algorithm significantly outperforms Hill Climbing and Simulated Annealing in terms of the quality of the generated graph layouts and the speed at which the layouts were produced. Using improved population sampling generated better layouts compared to the original Jaya algorithm using the same number of function evaluations. Moreover, Jaya algorithm was able to draw layouts for graphs with 500 nodes in a reasonable time.
COVID-19 World Vaccine Adverse Reactions
kaggle.com
Updated Jul 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayush Garg (2024). COVID-19 World Vaccine Adverse Reactions [Dataset]. https://www.kaggle.com/ayushggarg/covid19-vaccine-adverse-reactions/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 31, 2024
Dataset provided by
Kaggle
Authors
Ayush Garg
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Vaccine Adverse Event Reporting System (VAERS) was created by the Food and Drug Administration (FDA) and Centers for Disease Control and Prevention (CDC) to receive reports about adverse events that may be associated with vaccines. No prescription drug or biological product, such as a vaccine, is completely free from side effects. Vaccines protect many people from dangerous illnesses, but vaccines, like drugs, can cause side effects, a small percentage of which may be serious. VAERS is used to continually monitor reports to determine whether any vaccine or vaccine lot has a higher than expected rate of events.

Doctors and other vaccine providers are encouraged to report adverse events, even if they are not certain that the vaccination was the cause. Since it is difficult to distinguish a coincidental event from one truly caused by a vaccine, the VAERS database will contain events of both types.

This dataset is downloaded from VAERS datasets and processed (filtered) for only COVID-19 vaccines. For more details on the dataset refer to the User Guide.
League of Legends Worlds MainEvent 2024 Stats
kaggle.com
Updated Jan 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
smvjkk (2025). League of Legends Worlds MainEvent 2024 Stats [Dataset]. https://www.kaggle.com/datasets/smvjkk/league-of-legends-worlds-mainevent-2024-stats/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 30, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
smvjkk
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
I have created this dataset for people interested in League of Legends who want to approach the game from a more analytical side.

Most of the data was acquired from Games of Legends (https://gol.gg/tournament/tournament-stats/Worlds%20Main%20Event%202024/) and also from official account of LoL Esports (https://www.youtube.com/@lolesports)

Dataset Contents:

Player: Name of the player.

Role: Role of the player (e.g., TOP, JUNGLE, MID, ADC, SUPPORT)

Team: Name of the player's team

Opponent Team: Name of the opposing team

Opponent Player: Name of the opposing player

Date: Date of the match

Round: Round of the tournament

Day: Specific day of the tournament

Patch: Version of the game patch during the match

Stage: Stage of the tournament

No Game: Game number in the series

all Games: Total number of games in the series

Format: Format of the match (e.g., Best of 1, Best of 3)

Game of day: Number of the game that day

Side: Side of the map the team started on (Blue/Red)

Time: Duration of the match

Team Performance Metrics:

Kills Team: Total kills by the team

Turrets Team: Total turrets destroyed by the team

Dragon Team: Total dragons killed by the team

Baron Team: Total barons killed by the team

Player Performance Metrics:

Level: Final level of the player

Kills: Number of kills by the player

Deaths: Number of deaths of the player

Assists: Number of assists by the player

KDA: Kill/Death/Assist ratio

CS: Creep Score (minions killed)

CS in Team's Jungle: Creep Score in the team's jungle

CS in Enemy Jungle: Creep Score in the enemy's jungle

CSM: Creep Score per Minute

Golds: Total gold earned

GPM: Gold Per Minute

GOLD%: Percentage of team's total gold earned by the player

Vision and Warding:

Vision Score: Total vision score

Wards placed: Number of wards placed

Wards destroyed: Number of wards destroyed

Control Wards Purchased: Number of control wards purchased

Detector Wards Placed: Number of detector wards placed

VSPM: Vision Score Per Minute

WPM: Wards Placed per Minute

VWPM: Vision Wards Placed per Minute

WCPM: Wards Cleared per Minute

VS%: Vision Score percentage

Damage Metrics:

Total damage to Champion: Total damage dealt to champions

Physical Damage: Total physical damage dealt

Magic Damage: Total magic damage dealt

True Damage: Total true damage dealt

DPM: Damage Per Minute

DMG%: Percentage of team’s total damage dealt by the player

Combat Metrics:

K+A Per Minute: Kills and Assists per Minute

KP%: Kill Participation percentage

Solo kills: Number of solo kills

Double kills: Number of double kills

Triple kills: Number of triple kills

Quadra kills: Number of quadra kills

Penta kills: Number of pentakills

Early Game Metrics:

GD@15: Gold Difference at 15 minutes

CSD@15: Creep Score Difference at 15 minutes

XPD@15: Experience Difference at 15 minutes

LVLD@15: Level Difference at 15 minutes

Objective Control:

Objectives Stolen: Number of objectives stolen

Damage dealt to turrets: Total damage dealt to turrets

Damage dealt to buildings: Total damage dealt to buildings

Healing and Mitigation:

Total heal: Total healing done

Total Heals On Teammates: Total healing done on teammates

Damage self mitigated: Total damage self-mitigated

Total Damage Shielded On Teammates: Total damage shielded on teammates

Crowd Control Metrics:

Time ccing others: Time spent crowd controlling others

Total Time CC Dealt: Total crowd control time dealt

Survival and Economy:

Total damage taken: Total damage taken

Total Time Spent Dead: Total time spent dead

Consumables purchased: Number of consumables purchased

Items Purchased: Number of items purchased

Shutdown bounty collected: Total shutdown bounty collected

Shutdown bounty lost: Total shutdown bounty lost
s
COVID-19 Pandemic - Worldwide
ods.backoffice.smartidf.services
processor1.francecentral.cloudapp.azure.com
+5more
csv, excel, geojson +1
Updated Jun 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). COVID-19 Pandemic - Worldwide [Dataset]. https://ods.backoffice.smartidf.services/explore/dataset/covid-19-pandemic-worldwide-data/?flg=fr-fr
Explore at:
geojson, excel, csv, jsonAvailable download formats
Dataset updated
Jun 21, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the data for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).Data SourcesWorld Health Organization (WHO): https://www.who.int/ DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia. BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/ National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html Macau Government: https://www.ssm.gov.mo/portal/ Taiwan CDC: https://sites.google.com/cdc.gov.tw/2019ncov/taiwan?authuser=0 US CDC: https://www.cdc.gov/coronavirus/2019-ncov/index.html Government of Canada: https://www.canada.ca/en/public-health/services/diseases/coronavirus.html Australia Government Department of Health: https://www.health.gov.au/news/coronavirus-update-at-a-glance European Centre for Disease Prevention and Control (ECDC): https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-casesMinistry of Health Singapore (MOH): https://www.moh.gov.sg/covid-19Italy Ministry of Health: http://www.salute.gov.it/nuovocoronavirus
Synthetic Financial Datasets For Fraud Detection
kaggle.com
zip
Updated Apr 3, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edgar Lopez-Rojas (2017). Synthetic Financial Datasets For Fraud Detection [Dataset]. https://www.kaggle.com/ealaxi/paysim1
Explore at:
zip(186385561 bytes)Available download formats
Dataset updated
Apr 3, 2017
Authors
Edgar Lopez-Rojas
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

There is a lack of public available datasets on financial services and specially in the emerging mobile money transactions domain. Financial datasets are important to many researchers and in particular to us performing research in the domain of fraud detection. Part of the problem is the intrinsically private nature of financial transactions, that leads to no publicly available datasets.

We present a synthetic dataset generated using the simulator called PaySim as an approach to such a problem. PaySim uses aggregated data from the private dataset to generate a synthetic dataset that resembles the normal operation of transactions and injects malicious behaviour to later evaluate the performance of fraud detection methods.

Content

PaySim simulates mobile money transactions based on a sample of real transactions extracted from one month of financial logs from a mobile money service implemented in an African country. The original logs were provided by a multinational company, who is the provider of the mobile financial service which is currently running in more than 14 countries all around the world.

This synthetic dataset is scaled down 1/4 of the original dataset and it is created just for Kaggle.

Headers

This is a sample of 1 row with headers explanation:

1,PAYMENT,1060.31,C429214117,1089.0,28.69,M1591654462,0.0,0.0,0,0

step - maps a unit of time in the real world. In this case 1 step is 1 hour of time. Total steps 744 (30 days simulation).

type - CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.

amount - amount of the transaction in local currency.

nameOrig - customer who started the transaction

oldbalanceOrg - initial balance before the transaction

newbalanceOrig - new balance after the transaction

nameDest - customer who is the recipient of the transaction

oldbalanceDest - initial balance recipient before the transaction. Note that there is not information for customers that start with M (Merchants).

newbalanceDest - new balance recipient after the transaction. Note that there is not information for customers that start with M (Merchants).

isFraud - This is the transactions made by the fraudulent agents inside the simulation. In this specific dataset the fraudulent behavior of the agents aims to profit by taking control or customers accounts and try to empty the funds by transferring to another account and then cashing out of the system.

isFlaggedFraud - The business model aims to control massive transfers from one account to another and flags illegal attempts. An illegal attempt in this dataset is an attempt to transfer more than 200.000 in a single transaction.

Past Research

There are 5 similar files that contain the run of 5 different scenarios. These files are better explained at my PhD thesis chapter 7 (PhD Thesis Available here http://urn.kb.se/resolve?urn=urn:nbn:se:bth-12932).

We ran PaySim several times using random seeds for 744 steps, representing each hour of one month of real time, which matches the original logs. Each run took around 45 minutes on an i7 intel processor with 16GB of RAM. The final result of a run contains approximately 24 million of financial records divided into the 5 types of categories: CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.

Acknowledgements

This work is part of the research project ”Scalable resource-efficient systems for big data analytics” funded by the Knowledge Foundation (grant: 20140032) in Sweden.

Please refer to this dataset using the following citations:

PaySim first paper of the simulator:

E. A. Lopez-Rojas , A. Elmir, and S. Axelsson. "PaySim: A financial mobile money simulator for fraud detection". In: The 28th European Modeling and Simulation Symposium-EMSS, Larnaca, Cyprus. 2016
P
Kaggle EyePACS Dataset
paperswithcode.com
Updated Oct 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Kaggle EyePACS Dataset [Dataset]. https://paperswithcode.com/dataset/kaggle-eyepacs
Explore at:
Dataset updated
Oct 28, 2020
Description
Diabetic retinopathy is the leading cause of blindness in the working-age population of the developed world. It is estimated to affect over 93 million people.

retina

The US Center for Disease Control and Prevention estimates that 29.1 million people in the US have diabetes and the World Health Organization estimates that 347 million people have the disease worldwide. Diabetic Retinopathy (DR) is an eye disease associated with long-standing diabetes. Around 40% to 45% of Americans with diabetes have some stage of the disease. Progression to vision impairment can be slowed or averted if DR is detected in time, however this can be difficult as the disease often shows few symptoms until it is too late to provide effective treatment.

Currently, detecting DR is a time-consuming and manual process that requires a trained clinician to examine and evaluate digital color fundus photographs of the retina. By the time human readers submit their reviews, often a day or two later, the delayed results lead to lost follow up, miscommunication, and delayed treatment.

Clinicians can identify DR by the presence of lesions associated with the vascular abnormalities caused by the disease. While this approach is effective, its resource demands are high. The expertise and equipment required are often lacking in areas where the rate of diabetes in local populations is high and DR detection is most needed. As the number of individuals with diabetes continues to grow, the infrastructure needed to prevent blindness due to DR will become even more insufficient.

The need for a comprehensive and automated method of DR screening has long been recognized, and previous efforts have made good progress using image classification, pattern recognition, and machine learning. With color fundus photography as input, the goal of this competition is to push an automated detection system to the limit of what is possible – ideally resulting in models with realistic clinical potential. The winning models will be open sourced to maximize the impact such a model can have on improving DR detection.

Acknowledgements This competition is sponsored by the California Healthcare Foundation.

Retinal images were provided by EyePACS, a free platform for retinopathy screening.
Living Standards Survey IV 1998-1999 - World Bank SHIP Harmonized Dataset -...
datacatalog.ihsn.org
catalog.ihsn.org
+2more
Updated Mar 29, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghana Statistical Service (GSS) (2019). Living Standards Survey IV 1998-1999 - World Bank SHIP Harmonized Dataset - Ghana [Dataset]. https://datacatalog.ihsn.org/catalog/2359
Explore at:
Dataset updated
Mar 29, 2019
Dataset provided by
Ghana Statistical Services
Authors
Ghana Statistical Service (GSS)
Time period covered
1998 - 1999
Area covered
Ghana
Description
Abstract

Survey based Harmonized Indicators (SHIP) files are harmonized data files from household surveys that are conducted by countries in Africa. To ensure the quality and transparency of the data, it is critical to document the procedures of compiling consumption aggregation and other indicators so that the results can be duplicated with ease. This process enables consistency and continuity that make temporal and cross-country comparisons consistent and more reliable.

Four harmonized data files are prepared for each survey to generate a set of harmonized variables that have the same variable names. Invariably, in each survey, questions are asked in a slightly different way, which poses challenges on consistent definition of harmonized variables. The harmonized household survey data present the best available variables with harmonized definitions, but not identical variables. The four harmonized data files are

a) Individual level file (Labor force indicators in a separate file): This file has information on basic characteristics of individuals such as age and sex, literacy, education, health, anthropometry and child survival. b) Labor force file: This file has information on labor force including employment/unemployment, earnings, sectors of employment, etc. c) Household level file: This file has information on household expenditure, household head characteristics (age and sex, level of education, employment), housing amenities, assets, and access to infrastructure and services. d) Household Expenditure file: This file has consumption/expenditure aggregates by consumption groups according to Purpose (COICOP) of Household Consumption of the UN.

Geographic coverage

National

Analysis unit

Individual level for datasets with suffix _I and _L

Household level for datasets with suffix _H and _E

Universe

The survey covered all de jure household members (usual residents).

Kind of data

Sample survey data [ssd]

Sampling procedure

SAMPLE DESIGN FOR ROUND 4 OF THE GLSS A nationally representative sample of households was selected in order to achieve the survey objectives.

Sample Frame For the purposes of this survey the list of the 1984 population census Enumeration Areas (EAs) with population and household information was used as the sampling frame. The primary sampling units were the 1984 EAs with the secondary units being the households in the EAs. This frame, though quite old, was considered inadequate, it being the best available at the time. Indeed, this frame was used for the earlier rounds of the GLSS.

Stratification In order to increase precision and reliability of the estimates, the technique of stratification was employed in the sample design, using geographical factors, ecological zones and location of residence as the main controls. Specifically, the EAs were first stratified according to the three ecological zones namely; Coastal, Forest and Savannah, and then within each zone further stratification was done based on the size of the locality into rural or urban.

SAMPLE SELECTION EAs A two-stage sample was selected for the survey. At the first stage, 300 EAs were selected using systematic sampling with probability proportional to size method (PPS) where the size measure is the 1984 number of households in the EA. This was achieved by ordering the list of EAs with their sizes according to the strata. The size column was then cumulated, and with a random start and a fixed interval the sample EAs were selected.

It was observed that some of the selected EAs had grown in size over time and therefore needed segmentation. In this connection, such EAs were divided into approximately equal parts, each segment constituting about 200 households. Only one segment was then randomly selected for listing of the households.

Households At the second stage, a fixed number of 20 households was systematically selected from each selected EA to give a total of 6,000 households. Additional 5 households were selected as reserve to replace missing households. Equal number of households was selected from each EA in order to reflect the labour force focus of the survey.

NOTE: The above sample selection procedure deviated slightly from that used for the earlier rounds of the GLSS, as such the sample is not self-weighting. This is because, 1. given the long period between 1984 and the GLSS 4 fieldwork the number of households in the various EAs are likely to have grown at different rates. 2. the listing exercise was not properly done as some of the selected EAs were not listed completely. Moreover, it was noted that the segmentation done for larger EAs during the listing was a bit arbitrary.

Mode of data collection

Face-to-face [f2f]
p
Counts of Dengue reported in LAO PEOPLE'S DEMOCRATIC REPUBLIC: 1979-2010
tycho.pitt.edu
Updated Apr 1, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem G Van Panhuis; Anne L Cross; Donald S Burke; Marc Choisy (2018). Counts of Dengue reported in LAO PEOPLE'S DEMOCRATIC REPUBLIC: 1979-2010 [Dataset]. https://www.tycho.pitt.edu/dataset/LA.38362002
Explore at:
Dataset updated
Apr 1, 2018
Dataset provided by
Project Tycho, University of Pittsburgh
Authors
Willem G Van Panhuis; Anne L Cross; Donald S Burke; Marc Choisy
Time period covered
1979 - 2010
Area covered
Laos
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
World Development Indicators
kaggle.com
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr.HaidEr MoHiE (2024). World Development Indicators [Dataset]. https://www.kaggle.com/datasets/haiderkraheem/world-development-indicators
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dr.HaidEr MoHiE
License
https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
Description
World Development Indicators (WDI) :- is a comprehensive database maintained by the World Bank that provides a broad range of indicators measuring economic, social, and environmental development across countries. It is one of the most widely used resources for analyzing global development trends.

Key Features of WDI: 1- Extensive Coverage: - Includes data from over 200 countries and regions. - Covers long time periods, typically from 1960 to the present, allowing for long-term analysis. 2- Diverse Areas: - Economy: Data on GDP, inflation, trade, debt, exchange rates, and other key economic indicators. - Social Development: Includes indicators on education, health, population, poverty, and labor markets. - Environment: Covers indicators related to natural resources, energy use, environmental protection, and climate change. - Infrastructure and Technology: Data on access to services like water, electricity, internet, and other technologies. Governance: Indicators on regulatory quality, rule of law, government effectiveness, and corruption control. 3-Data Sources: - The WDI database compiles data from various international sources, including national statistical agencies, international organizations, and research institutions. 4-Uses and Applications: - Policymakers and researchers use WDI data to inform decision-making, track progress toward development goals, and compare countries. - Academics and students use it for research, analysis, and academic projects related to global development. 5-Accessibility: - The data is freely available and accessible through the World Bank's website, allowing users to download, visualize, and analyze the indicators. - WDI is an essential tool for anyone interested in understanding the complexities of global development and the factors that influence the well-being of nations.
z
Counts of COVID-19 reported in LAO PEOPLE'S DEMOCRATIC REPUBLIC: 2020-2021
zenodo.org
catalog.midasnetwork.us
+1more
json, xml, zip
Updated Jun 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MIDAS Coordination Center; MIDAS Coordination Center (2024). Counts of COVID-19 reported in LAO PEOPLE'S DEMOCRATIC REPUBLIC: 2020-2021 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/la.840539006
Explore at:
zip, xml, jsonAvailable download formats
Unique identifier
https://doi.org/10.25337/t7/ptycho.v2.0/la.840539006
Dataset updated
Jun 3, 2024
Dataset provided by
Project Tycho
Authors
MIDAS Coordination Center; MIDAS Coordination Center
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 3, 2020 - Jul 31, 2021
Area covered
Laos
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team, except for aggregation of individual case count data into daily counts when that was the best data available for a disease and location. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format. All geographic locations at the country and admin1 level have been represented at the same geographic level as in the data source, provided an ISO code or codes could be identified, unless the data source specifies that the location is listed at an inaccurate geographical level. For more information about decisions made by the curation team, recommended data processing steps, and the data sources used, please see the README that is included in the dataset download ZIP file.
Gallup World Poll 2013, June - Afghanistan, Angola, Albania...and 183 more
datacatalog.ihsn.org
catalog.ihsn.org
Updated Jun 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gallup, Inc. (2022). Gallup World Poll 2013, June - Afghanistan, Angola, Albania...and 183 more [Dataset]. https://datacatalog.ihsn.org/catalog/8494
Explore at:
Dataset updated
Jun 14, 2022
Dataset authored and provided by
Gallup, Inc.http://gallup.com/
Time period covered
2005 - 2012
Area covered
Albania, Angola, Afghanistan
Description
Abstract

Gallup Worldwide Research continually surveys residents in more than 150 countries, representing more than 98% of the world's adult population, using randomly selected, nationally representative samples. Gallup typically surveys 1,000 individuals in each country, using a standard set of core questions that has been translated into the major languages of the respective country. In some regions, supplemental questions are asked in addition to core questions. Face-to-face interviews are approximately 1 hour, while telephone interviews are about 30 minutes. In many countries, the survey is conducted once per year, and fieldwork is generally completed in two to four weeks. The Country Dataset Details spreadsheet displays each country's sample size, month/year of the data collection, mode of interviewing, languages employed, design effect, margin of error, and details about sample coverage.

Gallup is entirely responsible for the management, design, and control of Gallup Worldwide Research. For the past 70 years, Gallup has been committed to the principle that accurately collecting and disseminating the opinions and aspirations of people around the globe is vital to understanding our world. Gallup's mission is to provide information in an objective, reliable, and scientifically grounded manner. Gallup is not associated with any political orientation, party, or advocacy group and does not accept partisan entities as clients. Any individual, institution, or governmental agency may access the Gallup Worldwide Research regardless of nationality. The identities of clients and all surveyed respondents will remain confidential.

Kind of data

Sample survey data [ssd]

Sampling procedure

SAMPLING AND DATA COLLECTION METHODOLOGY With some exceptions, all samples are probability based and nationally representative of the resident population aged 15 and older. The coverage area is the entire country including rural areas, and the sampling frame represents the entire civilian, non-institutionalized, aged 15 and older population of the entire country. Exceptions include areas where the safety of interviewing staff is threatened, scarcely populated islands in some countries, and areas that interviewers can reach only by foot, animal, or small boat.

Telephone surveys are used in countries where telephone coverage represents at least 80% of the population or is the customary survey methodology (see the Country Dataset Details for detailed information for each country). In Central and Eastern Europe, as well as in the developing world, including much of Latin America, the former Soviet Union countries, nearly all of Asia, the Middle East, and Africa, an area frame design is used for face-to-face interviewing.

The typical Gallup Worldwide Research survey includes at least 1,000 surveys of individuals. In some countries, oversamples are collected in major cities or areas of special interest. Additionally, in some large countries, such as China and Russia, sample sizes of at least 2,000 are collected. Although rare, in some instances the sample size is between 500 and 1,000. See the Country Dataset Details for detailed information for each country.

FACE-TO-FACE SURVEY DESIGN

FIRST STAGE In countries where face-to-face surveys are conducted, the first stage of sampling is the identification of 100 to 135 ultimate clusters (Sampling Units), consisting of clusters of households. Sampling units are stratified by population size and or geography and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size, otherwise simple random sampling is used. Samples are drawn independent of any samples drawn for surveys conducted in previous years.

There are two methods for sample stratification:

METHOD 1: The sample is stratified into 100 to 125 ultimate clusters drawn proportional to the national population, using the following strata: 1) Areas with population of at least 1 million 2) Areas 500,000-999,999 3) Areas 100,000-499,999 4) Areas 50,000-99,999 5) Areas 10,000-49,999 6) Areas with less than 10,000

The strata could include additional stratum to reflect populations that exceed 1 million as well as areas with populations less than 10,000. Worldwide Research Methodology and Codebook Copyright © 2008-2012 Gallup, Inc. All rights reserved. 8

METHOD 2:

A multi-stage design is used. The country is first stratified by large geographic units, and then by smaller units within geography. A minimum of 33 Primary Sampling Units (PSUs), which are first stage sampling units, are selected. The sample design results in 100 to 125 ultimate clusters.

SECOND STAGE

Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. To increase the probability of contact and completion, attempts are made at different times of the day, and where possible, on different days. If an interviewer cannot obtain an interview at the initial sampled household, he or she uses a simple substitution method. Refer to Appendix C for a more in-depth description of random route procedures.

THIRD STAGE

Respondents are randomly selected within the selected households. Interviewers list all eligible household members and their ages or birthdays. The respondent is selected by means of the Kish grid (refer to Appendix C) in countries where face-to-face interviewing is used. The interview does not inform the person who answers the door of the selection criteria until after the respondent has been identified. In a few Middle East and Asian countries where cultural restrictions dictate gender matching, respondents are randomly selected using the Kish grid from among all eligible adults of the matching gender.

TELEPHONE SURVEY DESIGN

In countries where telephone interviewing is employed, random-digit-dial (RDD) or a nationally representative list of phone numbers is used. In select countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to reach a person in each household, spread over different days and times of day. Appointments for callbacks that fall within the survey data collection period are made.

PANEL SURVEY DESIGN

Prior to 2009, United States data were collected using The Gallup Panel. The Gallup Panel is a probability-based, nationally representative panel, for which all members are recruited via random-digit-dial methodology and is only used in the United States. Participants who elect to join the panel are committing to the completion of two to three surveys per month, with the typical survey lasting 10 to 15 minutes. The Gallup Worldwide Research panel survey is conducted over the telephone and takes approximately 30 minutes. No incentives are given to panel participants. Worldwide Research Methodology and Codebook Copyright © 2008-2012 Gallup, Inc. All rights reserved. 9

Research instrument

QUESTION DESIGN

Many of the Worldwide Research questions are items that Gallup has used for years. When developing additional questions, Gallup employed its worldwide network of research and political scientists1 to better understand key issues with regard to question development and construction and data gathering. Hundreds of items were developed, tested, piloted, and finalized. The best questions were retained for the core questionnaire and organized into indexes. Most items have a simple dichotomous ("yes or no") response set to minimize contamination of data because of cultural differences in response styles and to facilitate cross-cultural comparisons.

The Gallup Worldwide Research measures key indicators such as Law and Order, Food and Shelter, Job Creation, Migration, Financial Wellbeing, Personal Health, Civic Engagement, and Evaluative Wellbeing and demonstrates their correlations with world development indicators such as GDP and Brain Gain. These indicators assist leaders in understanding the broad context of national interests and establishing organization-specific correlations between leading indexes and lagging economic outcomes.

Gallup organizes its core group of indicators into the Gallup World Path. The Path is an organizational conceptualization of the seven indexes and is not to be construed as a causal model. The individual indexes have many properties of a strong theoretical framework. A more in-depth description of the questions and Gallup indexes is included in the indexes section of this document. In addition to World Path indexes, Gallup Worldwide Research questions also measure opinions about national institutions, corruption, youth development, community basics, diversity, optimism, communications, religiosity, and numerous other topics. For many regions of the world, additional questions that are specific to that region or country are included in surveys. Region-specific questions have been developed for predominantly Muslim nations, former Soviet Union countries, the Balkans, sub-Saharan Africa, Latin America, China and India, South Asia, and Israel and the Palestinian Territories.

The questionnaire is translated into the major conversational languages of each country. The translation process starts with an English, French, or Spanish version, depending on the region. One of two translation methods may be used.

METHOD 1: Two independent translations are completed. An independent third party, with some knowledge of survey research methods, adjudicates the differences. A professional translator translates the final version back into the source language.

METHOD 2: A translator
GBIF Backbone Taxonomy
gbif.org
smng.net
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GBIF (2023). GBIF Backbone Taxonomy [Dataset]. http://doi.org/10.15468/39omei
Explore at:
Unique identifier
https://doi.org/10.15468/39omei
Dataset updated
Nov 17, 2023
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The GBIF Backbone Taxonomy is a single, synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another.

It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families. Additional scientific names only found in other authoritative nomenclatural and taxonomic datasets are then merged into the tree, thus extending the original catalogue and broadening the backbones name coverage. The GBIF Backbone taxonomy also includes identifiers for Operational Taxonomic Units (OTUs) drawn from the barcoding resources iBOL and UNITE.

International Barcode of Life project (iBOL), Barcode Index Numbers (BINs). BINs are connected to a taxon name and its classification by taking into account all names applied to the BIN and picking names with at least 80% consensus. If there is no consensus of name at the species level, the selection process is repeated moving up the major Linnaean ranks until consensus is achieved.

UNITE - Unified system for the DNA based fungal species, Species Hypotheses (SHs). SHs are connected to a taxon name and its classification based on the determination of the RefS (reference sequence) if present or the RepS (representative sequence). In the latter case, if there is no match in the UNITE taxonomy, the lowest rank with 100% consensus within the SH will be used.

The GBIF Backbone Taxonomy is available for download at https://hosted-datasets.gbif.org/datasets/backbone/ in different formats together with an archive of all previous versions.

The following 105 sources have been used to assemble the GBIF backbone with number of names given in brackets:
Catalogue of Life Checklist - 4766428 names
International Barcode of Life project (iBOL) Barcode Index Numbers (BINs) - 635951 names
UNITE - Unified system for the DNA based fungal species linked to the classification - 611208 names
The Paleobiology Database - 212054 names
World Register of Marine Species - 188857 names
The Interim Register of Marine and Nonmarine Genera - 183894 names
The World Checklist of Vascular Plants (WCVP) - 131891 names
GBIF Backbone Taxonomy - 114350 names
TAXREF - 109374 names
The Leipzig catalogue of vascular plants - 75380 names
ZooBank - 73549 names
Integrated Taxonomic Information System (ITIS) - 68377 names
Plazi.org taxonomic treatments database - 61346 names
Genome Taxonomy Database r207 - 60545 names
International Plant Names Index - 52329 names
Fauna Europaea - 45077 names
The National Checklist of Taiwan (Catalogue of Life in Taiwan, TaiCoL) - 36193 names
Dyntaxa. Svensk taxonomisk databas - 35892 names
The Plant List with literature - 32692 names
United Kingdom Species Inventory (UKSI) - 29643 names
Artsnavnebasen - 29208 names
The IUCN Red List of Threatened Species - 21221 names
Afromoths, online database of Afrotropical moth species (Lepidoptera) - 13961 names
Brazilian Flora 2020 project - Projeto Flora do Brasil 2020 - 13829 names
Prokaryotic Nomenclature Up-to-Date (PNU) - 10079 names
Checklist Dutch Species Register - Nederlands Soortenregister - 8814 names
ICTV Master Species List (MSL) - 7852 names
Cockroach Species File - 6020 names
GRIN Taxonomy - 5882 names
Taxon list of fungi and fungal-like organisms from Germany compiled by the DGfM - 4570 names
Catalogue of Afrotropical Bees - 3623 names
Catalogue of Tenebrionidae (Coleoptera) of North America - 3327 names
Checklist of Beetles (Coleoptera) of Canada and Alaska. Second Edition. - 3312 names
Systema Dipterorum - 2850 names
Catalogue of the Pterophoroidea of the World - 2807 names
The Clements Checklist - 2675 names
Taxon list of Hymenoptera from Germany compiled in the context of the GBOL project - 2496 names
IOC World Bird List, v13.2 - 2366 names
Official Lists and Indexes of Names in Zoology - 2310 names
National checklist of all species occurring in Denmark - 1922 names
Myriatrix - 1876 names
Database of Vascular Plants of Canada (VASCAN) - 1822 names
Taxon list of vascular plants from Bavaria, Germany compiled in the context of the BFL project - 1771 names
Orthoptera Species File - 1742 names
A list of the terrestrial fungi, flora and fauna of Madeira and Selvagens archipelagos - 1602 names
Aphid Species File - 1565 names
World Spider Catalog - 1561 names
Taxon list of Jurassic Pisces of the Tethys Palaeo-Environment compiled at the SNSB-JME - 1270 names
Backbone Family Classification Patch - 1143 names
GBIF Algae Classification - 1100 names
International Cichorieae Network (ICN): Cichorieae Portal - 975 names
Psocodea Species File - 803 names
New Zealand Marine Macroalgae Species Checklist - 787 names
Annotated checklist of endemic species from the Western Balkans - 754 names
Taxon list of animals with German names (worldwide) compiled at the SMNS - 503 names
Catalogue of the Alucitoidea of the World - 472 names
Lygaeoidea Species File - 462 names
Catálogo de Plantas y Líquenes de Colombia - 422 names
GBIF Backbone Patch - 317 names
Phasmida Species File - 259 names
Cortinariaceae fetched from the Index Fungorum API - 234 names
Coreoidea Species File - 233 names
GTDB supplement - 139 names
Mantodea Species File - 119 names
Endemic species in Taiwan - 93 names
Taxon list of Araneae from Germany compiled in the context of the GBOL project - 88 names
Species of Hominidae - 78 names
Taxon list of Sternorrhyncha from Germany compiled in the context of the GBOL project - 77 names
Taxon list of mosses from Germany compiled in the context of the GBOL project - 75 names
Mammal Species of the World - 73 names
Plecoptera Species File - 71 names
Species Fungorum Plus - 64 names
Catalogue of the type specimens of Cosmopterigidae (Lepidoptera: Gelechioidea) from research collections of the Zoological Institute, Russian Academy of Sciences - 47 names
Species named after famous people - 41 names
Dermaptera Species File - 36 names
Taxon list of Trichoptera from Germany compiled in the context of the GBOL project - 34 names
True Fruit Flies (Diptera, Tephritidae) of the Afrotropical Region - 33 names
Range and Regularities in the Distribution of Earthworms of the Earthworms of the USSR Fauna. Perel, 1979 - 32 names
Taxon list of Diplura from Germany compiled in the context of the GBOL project - 30 names
Lista de referencia de especies de aves de Colombia - 2022 - 24 names
Taxon list of Auchenorrhyncha from Germany compiled in the context of the GBOL project - 20 names
Catalogue of the type specimens of Polycestinae (Coleoptera: Buprestidae) from research collections of the Zoological Institute, Russian Academy of Sciences - 19 names
Taxon list of Thysanoptera from Germany compiled in the context of the GBOL project - 19 names
Lista de especies de vertebrados registrados en jurisdicción del Departamento del Huila - 18 names
Taxon list of Microcoryphia (Archaeognatha) from Germany compiled in the context of the GBOL project - 15 names
Catalogue of the type specimens of Bufonidae and Megophryidae (Amphibia: Anura) from research collections of the Zoological Institute, Russian Academy of Sciences - 12 names
Grylloblattodea Species File - 11 names
Coleorrhyncha Species File - 9 names
Taxon list of liverworts from Germany compiled in the context of the GBOL project - 9 names
Embioptera Species File - 7 names
Taxon list of Pisces and Cyclostoma from Germany compiled in the context of the GBOL project - 6 names
Taxon list of Pteridophyta from Germany compiled in the context of the GBOL project - 6 names
Taxon list of Siphonaptera from Germany compiled in the context of the GBOL project - 5 names
The Earthworms of the Fauna of Russia. Perel, 1997 - 5 names
Taxon list of Zygentoma from Germany compiled in the context of the GBOL project - 4 names
Asiloid Flies: new taxa of Diptera: Apioceridae, Asilidae, and Mydidae - 3 names
Taxon list of Protura from Germany compiled in the context of the GBOL project - 3 names
Taxon list of hornworts from Germany compiled in the context of the GBOL project - 2 names
Chrysididae Species File - 1 names
Taxon list of Dermaptera from Germany compiled in the context of the GBOL project - 1 names
Taxon list of Diplopoda from Germany in the context of the GBOL project - 1 names
Taxon list of Orthoptera (Grashoppers) from Germany compiled at the SNSB - 1 names
Taxon list of Pscoptera from Germany compiled in the context of the GBOL project - 1 names
Taxon list of Pseudoscorpiones from Germany compiled in the context of the GBOL project - 1 names
Taxon list of Raphidioptera from Germany compiled in the context of the GBOL project - 1 names
t
TomTom Map API - Dataset - Trafficdata.se
trafficdata.se
Updated Oct 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). TomTom Map API - Dataset - Trafficdata.se [Dataset]. https://trafficdata.se/dataset/tomtom-map-api
Explore at:
Dataset updated
Oct 31, 2024
Description
When you build with TomTom Maps APIs and map data sets, you build with a partner that combines three decades of mapping experience with the speed and soul of a start-up. We’re proud of our roots, and we never stop looking ahead – working together with you to bring the best, freshest map data and tech to people all over the world. When change happens in the real world, our transactional mapmaking ecosystem allows us to detect, verify and deliver it to the map fast – ensuring your customers, drivers and users always enjoy the most up-to-date map data. That same speed and flexibility extends to how we help you build your mapping app: You’re in control of your map data, choosing what you want to include in your final product.
h
lmsys-chat-1m
huggingface.co
opendatalab.com
Updated Sep 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Large Model Systems Organization (2023). lmsys-chat-1m [Dataset]. https://huggingface.co/datasets/lmsys/lmsys-chat-1m
Explore at:
Dataset updated
Sep 17, 2023
Dataset authored and provided by
Large Model Systems Organization
Description
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

This dataset contains one million real-world conversations with 25 state-of-the-art LLMs. It is collected from 210K unique IP addresses in the wild on the Vicuna demo and Chatbot Arena website from April to August 2023. Each sample includes a conversation ID, model name, conversation text in OpenAI API JSON format, detected language tag, and OpenAI moderation API tag. User consent is obtained through the "Terms of use"… See the full description on the dataset page: https://huggingface.co/datasets/lmsys/lmsys-chat-1m.

Facebook

Twitter

Click to copy link

Link copied

Cite

C-3PO (2021). The World Dataset of COVID-19 [Dataset]. https://www.kaggle.com/datasets/aditeloo/the-world-dataset-of-covid19/code

The World Dataset of COVID-19

Data on COVID-19 (coronavirus) by Our World in Data

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 25, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

C-3PO

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

World

Description

Context

These datasets are from Our World in Data. Their complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data. It is updated daily and includes data on confirmed cases, deaths, hospitalizations, testing, and vaccinations as well as other variables of potential interest.

Content

Confirmed cases and deaths:

our data comes from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). We discuss how and when JHU collects and publishes this data. The cases & deaths dataset is updated daily. Note: the number of cases or deaths reported by any institution—including JHU, the WHO, the ECDC, and others—on a given day does not necessarily represent the actual number on that date. This is because of the long reporting chain that exists between a new case/death and its inclusion in statistics. This also means that negative values in cases and deaths can sometimes appear when a country corrects historical data because it had previously overestimated the number of cases/deaths. Alternatively, large changes can sometimes (although rarely) be made to a country's entire time series if JHU decides (and has access to the necessary data) to correct values retrospectively.

Hospitalizations and intensive care unit (ICU) admissions:

our data comes from the European Centre for Disease Prevention and Control (ECDC) for a select number of European countries; the government of the United Kingdom; the Department of Health & Human Services for the United States; the COVID-19 Tracker for Canada. Unfortunately, we are unable to provide data on hospitalizations for other countries: there is currently no global, aggregated database on COVID-19 hospitalization, and our team at Our World in Data does not have the capacity to build such a dataset.

Testing for COVID-19:

this data is collected by the Our World in Data team from official reports; you can find further details in our post on COVID-19 testing, including our checklist of questions to understand testing data, information on geographical and temporal coverage, and detailed country-by-country source information. The testing dataset is updated around twice a week.

Acknowledgements

Our World in Data GitHub repository for covid-19.

Inspiration

All we love data, cause we love to go inside it and discover the truth that's the main inspiration I have.

Clear search

Close search

Google apps

Main menu

The World Dataset of COVID-19

Context

Content

Confirmed cases and deaths:

Hospitalizations and intensive care unit (ICU) admissions:

Testing for COVID-19:

Acknowledgements

Inspiration

Johns Hopkins COVID-19 Case Tracker

Updates

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

Queries

Interactive

Interactive Embed Code

Caveats

Attribution

Coronavirus (COVID-19) In-depth Dataset

Context

Content

Public Health Official Departures

Changelog:

Overview

Findings

Methodology

Attribution

Is Data Missing?

Coronavirus (Covid-19) Data in the United States

Data from: Illustrating potential effects of alternate control populations...

Description of the real-world dataset.

COVID-19 World Vaccine Adverse Reactions

League of Legends Worlds MainEvent 2024 Stats

Dataset Contents:

Team Performance Metrics:

Player Performance Metrics:

Vision and Warding:

Damage Metrics:

Combat Metrics:

Early Game Metrics:

Objective Control:

Healing and Mitigation:

Crowd Control Metrics:

Survival and Economy:

COVID-19 Pandemic - Worldwide

Synthetic Financial Datasets For Fraud Detection

Context

Content

Headers

Past Research

Acknowledgements

Kaggle EyePACS Dataset

Living Standards Survey IV 1998-1999 - World Bank SHIP Harmonized Dataset -...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Counts of Dengue reported in LAO PEOPLE'S DEMOCRATIC REPUBLIC: 1979-2010

World Development Indicators

Counts of COVID-19 reported in LAO PEOPLE'S DEMOCRATIC REPUBLIC: 2020-2021

Gallup World Poll 2013, June - Afghanistan, Angola, Albania...and 183 more

Abstract

Kind of data

Sampling procedure

Research instrument

GBIF Backbone Taxonomy

TomTom Map API - Dataset - Trafficdata.se

lmsys-chat-1m

The World Dataset of COVID-19

Data on COVID-19 (coronavirus) by Our World in Data

Context

Content

Confirmed cases and deaths:

Hospitalizations and intensive care unit (ICU) admissions:

Testing for COVID-19:

Acknowledgements

Inspiration