100+ datasets found

i
Our World in Data COVID-19 Dataset
ieee-dataport.org
Updated Aug 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lubna Altarawneh (2023). Our World in Data COVID-19 Dataset [Dataset]. http://doi.org/10.21227/2n61-4965
Explore at:
Unique identifier
https://doi.org/10.21227/2n61-4965
Dataset updated
Aug 16, 2023
Dataset provided by
IEEE Dataport
Authors
Lubna Altarawneh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data that is updated throughout the duration of COVID-19. It includes information related to confirmed cases and deaths, hospitalization, intensive care unit admissions, testing for COVID-19, and vaccination for COVID-19.Confirmed cases and deaths: this data is collected from the World Health Organization Coronavirus Dashboard. The cases & deaths dataset is updated daily.Note 1: Time/date stamps reflect when the data was last updated by WHO. Due to the time required to process and validate the incoming data, there is a delay between reporting to WHO and the update of the dashboard.Note 2: Counts and corrections made after these times will be carried forward to the next reporting cycle for that specific region. Delayed reporting for any specific country, territory or area may result in pooled counts for multiple days being presented, with a retrospective update to counts on previous days to accurately reflect trends. Significant data errors detected or reported to WHO may be corrected at more frequent intervals.Hospitalizations and intensive care unit (ICU) admissions: our data is collected from official sources and collated by Our World in Data. The complete list of country-by-country sources is available here.Testing for COVID-19: this data is collected by the Our World in Data team from official reports; you can find further details in our post on COVID-19 testing, including our checklist of questions to understand testing data, information on geographical and temporal coverage, and detailed country-by-country source information. On 23 June 2022, we stopped adding new datapoints to our COVID-19 testing dataset. You can read more here.Vaccinations against COVID-19: this data is collected by the Our World in Data team from official reports.Other variables: this data is collected from a variety of sources (United Nations, World Bank, Global Burden of Disease, Blavatnik School of Government, etc.). More information is available in our codebook.
o
Education Attainment and Enrollment around the World - Dataset - Data...
data.opendata.am
Updated Jul 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Education Attainment and Enrollment around the World - Dataset - Data Catalog Armenia [Dataset]. https://data.opendata.am/dataset/dcwb0038973
Explore at:
Dataset updated
Jul 7, 2023
Area covered
World
Description
Patterns of educational attainment vary greatly across countries, and across population groups within countries. In some countries, virtually all children complete basic education whereas in others large groups fall short. The primary purpose of this database, and the associated research program, is to document and analyze these differences using a compilation of a variety of household-based data sets: Demographic and Health Surveys (DHS); Multiple Indicator Cluster Surveys (MICS); Living Standards Measurement Study Surveys (LSMS); as well as country-specific Integrated Household Surveys (IHS) such as Socio-Economic Surveys.As shown at the website associated with this database, there are dramatic differences in attainment by wealth. When households are ranked according to their wealth status (or more precisely, a proxy based on the assets owned by members of the household) there are striking differences in the attainment patterns of children from the richest 20 percent compared to the poorest 20 percent.In Mali in 2012 only 34 percent of 15 to 19 year olds in the poorest quintile have completed grade 1 whereas 80 percent of the richest quintile have done so. In many countries, for example Pakistan, Peru and Indonesia, almost all the children from the wealthiest households have completed at least one year of schooling. In some countries, like Mali and Pakistan, wealth gaps are evident from grade 1 on, in other countries, like Peru and Indonesia, wealth gaps emerge later in the school system.The EdAttain website allows a visual exploration of gaps in attainment and enrollment within and across countries, based on the international database which spans multiple years from over 120 countries and includes indicators disaggregated by wealth, gender and urban/rural location. The database underlying that site can be downloaded from here.
d
Johns Hopkins COVID-19 Case Tracker
data.world
csv, zip
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
Explore at:
zip, csvAvailable download formats
Dataset updated
Mar 25, 2025
Authors
The Associated Press
Time period covered
Jan 22, 2020 - Mar 9, 2023
Area covered
Description
Updates

Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

CDC Weekly case and death counts (national and state level)

CDC County level cases and deaths

HHS New hospital admissions

CDC NowCast COVID variant proportions (national and regional level)

April 9, 2020

The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.

April 20, 2020

Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.

April 29, 2020

The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.

September 1st, 2020

Johns Hopkins is now providing counts for the five New York City counties individually.

February 12, 2021

The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."

Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.

February 16, 2021

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

The AP is updating this dataset hourly at 45 minutes past the hour.

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

Queries

Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

Filter cases by state here

Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac

Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true

Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.

Pull the 100 counties with the highest per-capita confirmed cases here

Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.

Interactive

The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

@(https://datawrapper.dwcdn.net/nRyaf/15/)

Interactive Embed Code

<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>

Caveats

This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.

In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.

In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"

This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.

Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.

The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

Attribution

This data should be credited to Johns Hopkins University COVID-19 tracking project
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Nov 21, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just two percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.
Z
RealVAD: A Real-world Dataset for Voice Activity Detection
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jul 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cigdem Beyan (2020). RealVAD: A Real-world Dataset for Voice Activity Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3928150
Explore at:
Dataset updated
Jul 3, 2020
Dataset provided by
Muhammad Shahid
Vittorio Murino
Cigdem Beyan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RealVAD: A Real-world Dataset for Voice Activity Detection

The task of automatically detecting “Who is Speaking and When” is broadly named as Voice Activity Detection (VAD). Automatic VAD is a very important task and also the foundation of several domains, e.g., human-human, human-computer/ robot/ virtual-agent interaction analyses, and industrial applications.

RealVAD dataset is constructed from a YouTube video composed of a panel discussion lasting approx. 83 minutes. The audio is available from a single channel. There is one static camera capturing all panelists, the moderator and audiences.

Particular aspects of RealVAD dataset are:

It is composed of panelists with different nationalities (British, Dutch, French, German, Italian, American, Mexican, Columbian, Thai). This aspect allows studying the effect of ethnic origin variety to the automatic VAD.

There is a gender balance such that there are four female and five male panelists.

The panelists are sitting in two rows and they can be gazing audience, other panelists, their laptop, the moderator or anywhere in the room while speaking or not-speaking. Therefore, they were captured not only from frontal-view but also from side-view varying based on their instant posture and head orientation.

The panelists are moving freely and are doing various spontaneous actions (e.g., drinking water, checking their cell phone, using their laptop, etc.), resulting in different postures.

The panelists’ body parts are sometimes partially occluded by their/other's body part or belongings (e.g., laptop).

There are also natural changes of illumination and shadow rising on the wall behind the panelists in the back row.

Especially, for the panelists sitting in the front row, there is sometimes background motion occurring when the person(s) behind them moves.

The annotations includes:

The upper body detection of nine panelists in bounding box form.

Associated VAD ground-truth (speaking, not-speaking) for nine panelists.

Acoustic features extracted from the video: MFCC and raw filterbank energies.

All info regarding the annotations are given in the ReadMe.txt and Acoustic Features README.txt files.

When using this dataset for your research, please cite the following paper in your publication:

C. Beyan, M. Shahid and V. Murino, "RealVAD: A Real-world Dataset and A Method for Voice Activity Detection by Body Motion Analysis", in IEEE Transactions on Multimedia, 2020.
Global Bilateral Migration Database
data.subak.org
datacatalog.worldbank.org
Updated Feb 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank Group (2023). Global Bilateral Migration Database [Dataset]. https://data.subak.org/dataset/global-bilateral-migration-database
Explore at:
Dataset updated
Feb 16, 2023
Dataset provided by
World Bankhttp://worldbank.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Global matrices of bilateral migrant stocks spanning the period 1960-2000, disaggregated by gender and based primarily on the foreign-born concept are presented. Over one thousand census and population register records are combined to construct decennial matrices corresponding to the last five completed census rounds.For the first time, a comprehensive picture of bilateral global migration over the last half of the twentieth century emerges. The data reveal that the global migrant stock increased from 92 to 165 million between 1960 and 2000. South-North migration is the fastest growing component of international migration in both absolute and relative terms. The United States remains the most important migrant destination in the world, home to one fifth of the world™s migrants and the top destination for migrants from no less than sixty sending countries. Migration to Western Europe remains largely from elsewhere in Europe. The oil-rich Persian Gulf countries emerge as important destinations for migrants from the Middle East, North Africa and South and South-East Asia. Finally, although the global migrant stock is still predominantly male, the proportion of women increased noticeably between 1960 and 2000.

Worldwide Bureaucracy Indicators

kaggle.com

Updated Jun 12, 2024

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Joakim Arvidsson (2024). Worldwide Bureaucracy Indicators [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/worldwide-bureaucracy-indicators/suggestions

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 12, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Joakim Arvidsson

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Worldwide Bureaucracy Indicators

Worldwide Bureaucracy Indicators (WWBI) dataset from the World Bank.

The Worldwide Bureaucracy Indicators (WWBI) database is a unique cross-national dataset on public sector employment and wages that aims to fill an information gap, thereby helping researchers, development practitioners, and policymakers gain a better understanding of the personnel dimensions of state capability, the footprint of the public sector within the overall labor market, and the fiscal implications of the public sector wage bill. The dataset is derived from administrative data and household surveys, thereby complementing existing, expert perception-based approaches.

The World Bank introduced the dataset with a series of four blogs:

Can you replicate the figures in the blogs? Can you display any of the data more clearly than in the blogs?

Data Dictionary

`wwbi_data.csv`

variable	class	description
country_code	character	3-letter ISO_3166-1 code
indicator_code	character	code identifying the indicator of bureaucracy
year	numeric	year of the data
value	numeric	numeric value of the data

`wwbi_series.csv`

variable	class	description
indicator_code	character	code identifying the indicator of bureaucracy
indicator_name	character	name of the indicator

`wwbi_country.csv`

variable	class	description
country_code	character	3-letter ISO_3166-1 code
short_name	character	short or common name for the country
table_name	character	more alphabetically sortable name of the country
long_name	character	full name of the country
x2_alpha_code	character	2-letter ISO_3166-1 code
currency_unit	character	currency unit
special_notes	character	special notes
region	character	region
income_group	character	low, lower middle, upper middle, or high income
wb_2_code	character	alternate 2-letter code
national_accounts_base_year	integer	national accounts base year
national_accounts_reference_year	integer	national accounts reference year
sna_price_valuation	character	UN system of national accounts price valuation
lending_category	character	International Development Association (IDA), Interanational Bank of Reconstruction and Development (IBRD), a blend or neither
other_groups	character	Heavily Indebted Poor Countries initiative (HIPC), or countries classified as the "Euro area"
system_of_national_accounts	integer	which System of National Accounts methodology the country uses (1968, 1993, or 2008 version)
balance_of_payments_manual_in_use	character	the version of the Balance of Payments Manual used by the country
external_debt_reporting_status	character	estimate, preliminary, or actual
system_of_trade	character	Under the general system imports include goods imported for domestic consumption and imports into bonded warehouses and free trade zones. Under the special system imports comprise goods imported for domestic consumption (including transformation and repair) and withdrawals for domestic consumption from bonded warehouses and free trade zones. Goods transported through a country en route to another are excluded.
government_accounting_concept	character	government accounting concept
imf_data_dissemination_standard	character	International Monetary Fund data-dissemination standard: Special Data Dissemination Standard (SDDS, 1996, created for countries
that have or seek to have access to international markets), SDDS Plus (2012, the highest tier of data standards, intended for systemically important economies), enhanced GDDS (e-GDDS, 2015, encouraging participants to emphasize data publication)
latest_household_survey	character	which household survey was most recently administered
source_of_most_recent_income_and_expenditure_data	character	which survey serves as the basis for income and expenditure data
vital_registration_complete	logical	whether the vital registration is complete
latest_agricultural_census	integer	year of latest agricultural census
latest_industrial_data	integer	year of latest industrial data
latest_trade_data	in...

d
Data from: Comprehensive Global Database of Earthquake-Induced Landslide...
catalog.data.gov
data.usgs.gov
+2more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Comprehensive Global Database of Earthquake-Induced Landslide Events and their Impacts (ver. 2.0, February 2022) [Dataset]. https://catalog.data.gov/dataset/comprehensive-global-database-of-earthquake-induced-landslide-events-and-their-impacts-ver
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
Currently, there are many datasets describing landslides caused by individual earthquakes, and global inventories of earthquake-induced landslides (EQIL). However, until recently, there were no datasets that provide a comprehensive description of the impacts of earthquake-induced landslide events. In this data release, we present an up-to-date, comprehensive global database containing all literature-documented earthquake-induced landslide events for the 249-year period from 1772 through August 2021. The database represents an update of the catalog developed by Seal et al. (2020), which summarized events through March 2020 and was based on the catalog developed by Nowicki Jessee et al. (2020). The revised catalog contains 281 historical earthquakes, 162 of which include documented landslide fatality counts. This represents an addition of 17 earthquakes since the previous version, 9 with documented landslide fatalities, and a removal of 2 duplicate entries. The database includes (where available) information on earthquake size (moment magnitude (Mw), surface-wave magnitude (Ms), and body-wave magnitude (mb)), depth, earthquake fault type, date and time, location, the availability of a ShakeMap, which estimates the spatial distribution of ground shaking from the USGS ShakeMap system (Worden and Wald, 2016), the availability of a geospatial landslide inventory, information about landslide occurrence (number of landslides, area or volume of landsliding, area affected by landsliding, landslide magnitude), earthquake/landslide impact (total fatalities, landslide fatalities, and number of injuries due to the effects of the earthquake), and USGS Ground Failure Tool estimates (estimated area and population exposed to landsliding). The full dataset of all known landslide-triggering events is provided as “EQIL Database 2022.csv,” including information on the data source(s) for each data component. A subset of the dataset, showing only those events for which landslide fatality counts are available, is provided as “EQIL Database LSFatality 2022.csv.” This subset only includes those columns from "EQIL Database 2022.csv" which are necessary for landslide fatality data analysis and omits columns such as source columns and secondary values.
Meta Kaggle Code
kaggle.com
zip
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(133186454988 bytes)Available download formats
Dataset updated
Mar 20, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Identification for Development (ID4D) Global Dataset
datacatalog.worldbank.org
databank, excel
Updated Apr 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
id4d@worldbank.org (2023). Identification for Development (ID4D) Global Dataset [Dataset]. https://datacatalog.worldbank.org/search/dataset/0040787
Explore at:
excel, databankAvailable download formats
Dataset updated
Apr 24, 2023
Dataset provided by
World Bankhttp://worldbank.org/
License
https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
Description
The Identification for Development (ID4D) Global Dataset, compiled by the World Bank Group’s Identification for Development (ID4D) Initiative, presents a collection of indicators that are of relevance for the estimation of adult and child ID coverage and for understanding foundational ID systems' digital capabilities. The indicators have been compiled from multiple sources, including a specialized ID module included in the Global Findex survey and officially recognized international sources such as UNICEF. Although there is no single, globally recognized measure of having a ‘proof of legal identity’ that would cover children and adults at all ages or, of the digital capabilities of foundational ID systems, the combination of these indicators can help better understand where and what gaps in remain in accessing identification and, in turn, in accessing the services and transactions for which an official proof of identity is often required.

Newly in 2022, adult ID ownership data is primarily based on survey data questions collected in partnership with the Global Findex Survey, while coverage for children is based on birth registration rates compiled by UNICEF. These data series are accessible directly from the World Bank's Databank: https://databank.worldbank.org/source/identification-for-development-(id4d)-data. Prior editions of the data from 2017 and 2018 are available for download here. Updates were released on a yearly basis until 2018; beginning in 2021-2022, the dataset will be released every three years to align with the Findex survey.
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+3more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Data from: World Database on Protected Areas
americansamoa-data.sprep.org
pacificdata.org
+14more
jpg, pdf
Updated Apr 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Secretariat of the Pacific Regional Environment Programme (2025). World Database on Protected Areas [Dataset]. https://americansamoa-data.sprep.org/dataset/world-database-protected-areas
Explore at:
jpg(577876), pdf(2100272)Available download formats
Dataset updated
Apr 2, 2025
Dataset provided by
Pacific Regional Environment Programmehttps://www.sprep.org/
License
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Area covered
129.26953554153 1.4588219018416, 160.20703554153 -29.489341672009, 141.3632941246 -0.22851555560937, 167.23828554153 25.085596467854, 204.92578983307 6.2279312638895)), POLYGON ((205.20703554153 -28.505385171432, 129.26953554153 27.605668449605, 141.92578554153 -11.126668087769, Pacific Region
Description
The World Database on Protected Areas (WDPA) is the most comprehensive global database of marine and terrestrial protected areas, updated on a monthly basis, and is one of the key global biodiversity data sets being widely used by scientists, businesses, governments, International secretariats and others to inform planning, policy decisions and management. The WDPA is a joint project between UN Environment and the International Union for Conservation of Nature (IUCN). The compilation and management of the WDPA is carried out by UN Environment World Conservation Monitoring Centre (UNEP-WCMC), in collaboration with governments, non-governmental organisations, academia and industry. There are monthly updates of the data which are made available online through the Protected Planet website where the data is both viewable and downloadable. Data and information on the world's protected areas compiled in the WDPA are used for reporting to the Convention on Biological Diversity on progress towards reaching the Aichi Biodiversity Targets (particularly Target 11), to the UN to track progress towards the 2030 Sustainable Development Goals, to some of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) core indicators, and other international assessments and reports including the Global Biodiversity Outlook, as well as for the publication of the United Nations List of Protected Areas. Every two years, UNEP-WCMC releases the Protected Planet Report on the status of the world's protected areas and recommendations on how to meet international goals and targets. Many platforms are incorporating the WDPA to provide integrated information to diverse users, including businesses and governments, in a range of sectors including mining, oil and gas, and finance. For example, the WDPA is included in the Integrated Biodiversity Assessment Tool, an innovative decision support tool that gives users easy access to up-to-date information that allows them to identify biodiversity risks and opportunities within a project boundary. The reach of the WDPA is further enhanced in services developed by other parties, such as the Global Forest Watch and the Digital Observatory for Protected Areas, which provide decision makers with access to monitoring and alert systems that allow whole landscapes to be managed better. Together, these applications of the WDPA demonstrate the growing value and significance of the Protected Planet initiative.
d
Public Health Official Departures
data.world
csv, zip
Updated Jun 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2022). Public Health Official Departures [Dataset]. https://data.world/associatedpress/public-health-official-departures
Explore at:
csv, zipAvailable download formats
Dataset updated
Jun 7, 2022
Authors
The Associated Press
Description
Changelog:

Update September 20, 2021: Data and overview updated to reflect data used in the September 15 story Over Half of States Have Rolled Back Public Health Powers in Pandemic. It includes 303 state or local public health leaders who resigned, retired or were fired between April 1, 2020 and Sept. 12, 2021. Previous versions of this dataset reflected data used in the Dec. 2020 and April 2021 stories.

Overview

Across the U.S., state and local public health officials have found themselves at the center of a political storm as they combat the worst pandemic in a century. Amid a fractured federal response, the usually invisible army of workers charged with preventing the spread of infectious disease has become a public punching bag.

In the midst of the coronavirus pandemic, at least 303 state or local public health leaders in 41 states have resigned, retired or been fired since April 1, 2020, according to an ongoing investigation by The Associated Press and KHN.

According to experts, that is the largest exodus of public health leaders in American history.

Many left due to political blowback or pandemic pressure, as they became the target of groups that have coalesced around a common goal — fighting and even threatening officials over mask orders and well-established public health activities like quarantines and contact tracing. Some left to take higher profile positions, or due to health concerns. Others were fired for poor performance. Dozens retired. An untold number of lower level staffers have also left.

The result is a further erosion of the nation’s already fragile public health infrastructure, which KHN and the AP documented beginning in 2020 in the Underfunded and Under Threat project.

Findings

The AP and KHN found that:

One in five Americans live in a community that has lost its local public health department leader during the pandemic

Top public health officials in 28 states have left state-level departments ## Using this data To filter for data specific to your state, use this query

To get total numbers of exits by state, broken down by state and local departments, use this query

Methodology

KHN and AP counted how many state and local public health leaders have left their jobs between April 1, 2020 and Sept. 12, 2021.

The government tasks public health workers with improving the health of the general population, through their work to encourage healthy living and prevent infectious disease. To that end, public health officials do everything from inspecting water and food safety to testing the nation’s babies for metabolic diseases and contact tracing cases of syphilis.

Many parts of the country have a health officer and a health director/administrator by statute. The analysis counted both of those positions if they existed. For state-level departments, the count tracks people in the top and second-highest-ranking job.

The analysis includes exits of top department officials regardless of reason, because no matter the reason, each left a vacancy at the top of a health agency during the pandemic. Reasons for departures include political pressure, health concerns and poor performance. Others left to take higher profile positions or to retire. Some departments had multiple top officials exit over the course of the pandemic; each is included in the analysis.

Reporters compiled the exit list by reaching out to public health associations and experts in every state and interviewing hundreds of public health employees. They also received information from the National Association of City and County Health Officials, and combed news reports and records.

Public health departments can be found at multiple levels of government. Each state has a department that handles these tasks, but most states also have local departments that either operate under local or state control. The population served by each local health department is calculated using the U.S. Census Bureau 2019 Population Estimates based on each department’s jurisdiction.

KHN and the AP have worked since the spring on a series of stories documenting the funding, staffing and problems around public health. A previous data distribution detailed a decade's worth of cuts to state and local spending and staffing on public health. That data can be found here.

Attribution

Findings and the data should be cited as: "According to a KHN and Associated Press report."

Is Data Missing?

If you know of a public health official in your state or area who has left that position between April 1, 2020 and Sept. 12, 2021 and isn't currently in our dataset, please contact authors Anna Maria Barry-Jester annab@kff.org, Hannah Recht hrecht@kff.org, Michelle Smith mrsmith@ap.org and Lauren Weber laurenw@kff.org.
H
India - Population Counts
data.humdata.org
data.amerigeoss.org
geotiff
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WorldPop (2025). India - Population Counts [Dataset]. https://data.humdata.org/dataset/worldpop-population-counts-for-india
Explore at:
geotiffAvailable download formats
Dataset updated
Mar 14, 2025
Dataset provided by
WorldPop
Description
WorldPop produces different types of gridded population count datasets, depending on the methods used and end application. Please make sure you have read our Mapping Populations overview page before choosing and downloading a dataset.

Bespoke methods used to produce datasets for specific individual countries are available through the WorldPop Open Population Repository (WOPR) link below. These are 100m resolution gridded population estimates using customized methods ("bottom-up" and/or "top-down") developed for the latest data available from each country. They can also be visualised and explored through the woprVision App.
The remaining datasets in the links below are produced using the "top-down" method, with either the unconstrained or constrained top-down disaggregation method used. Please make sure you read the Top-down estimation modelling overview page to decide on which datasets best meet your needs. Datasets are available to download in Geotiff and ASCII XYZ format at a resolution of 3 and 30 arc-seconds (approximately 100m and 1km at the equator, respectively):

- Unconstrained individual countries 2000-2020 ( 1km resolution ): Consistent 1km resolution population count datasets created using unconstrained top-down methods for all countries of the World for each year 2000-2020.
- Unconstrained individual countries 2000-2020 ( 100m resolution ): Consistent 100m resolution population count datasets created using unconstrained top-down methods for all countries of the World for each year 2000-2020.
- Unconstrained individual countries 2000-2020 UN adjusted ( 100m resolution ): Consistent 100m resolution population count datasets created using unconstrained top-down methods for all countries of the World for each year 2000-2020 and adjusted to match United Nations national population estimates (UN 2019)
-Unconstrained individual countries 2000-2020 UN adjusted ( 1km resolution ): Consistent 1km resolution population count datasets created using unconstrained top-down methods for all countries of the World for each year 2000-2020 and adjusted to match United Nations national population estimates (UN 2019).
-Unconstrained global mosaics 2000-2020 ( 1km resolution ): Mosaiced 1km resolution versions of the "Unconstrained individual countries 2000-2020" datasets.
-Constrained individual countries 2020 ( 100m resolution ): Consistent 100m resolution population count datasets created using constrained top-down methods for all countries of the World for 2020.
-Constrained individual countries 2020 UN adjusted ( 100m resolution ): Consistent 100m resolution population count datasets created using constrained top-down methods for all countries of the World for 2020 and adjusted to match United Nations national population estimates (UN 2019).

Older datasets produced for specific individual countries and continents, using a set of tailored geospatial inputs and differing "top-down" methods and time periods are still available for download here: Individual countries and Whole Continent.

Data for earlier dates is available directly from WorldPop.

WorldPop (www.worldpop.org - School of Geography and Environmental Science, University of Southampton; Department of Geography and Geosciences, University of Louisville; Departement de Geographie, Universite de Namur) and Center for International Earth Science Information Network (CIESIN), Columbia University (2018). Global High Resolution Population Denominators Project - Funded by The Bill and Melinda Gates Foundation (OPP1134076). https://dx.doi.org/10.5258/SOTON/WP00645
GlobPOP: A 33-year (1990-2022) global gridded population dataset (Version...
zenodo.org
tiff
Updated Sep 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luling Liu; Xin Cao; Xin Cao; Shijie Li; Na Jie; Luling Liu; Shijie Li; Na Jie (2024). GlobPOP: A 33-year (1990-2022) global gridded population dataset (Version 2.0-test-alpha) [Dataset]. http://doi.org/10.5281/zenodo.11071249
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11071249
Dataset updated
Sep 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Luling Liu; Xin Cao; Xin Cao; Shijie Li; Na Jie; Luling Liu; Shijie Li; Na Jie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Usage Notice

This version is not recommended for download. Please check the newest version.

We would like to inform you that the updated GlobPOP dataset (2021-2022) have been available in version 2.0. The GlobPOP dataset (2021-2022) in the current version is not recommended for your work. The GlobPOP dataset (1990-2020) in the current version is the same as version 1.0.

Thank you for your continued support of the GlobPOP.

If you encounter any issues, please contact us via email at lulingliu@mail.bnu.edu.cn.

Introduction

Continuously monitoring global population spatial dynamics is essential for implementing effective policies related to sustainable development, such as epidemiology, urban planning, and global inequality.

Here, we present GlobPOP, a new continuous global gridded population product with a high-precision spatial resolution of 30 arcseconds from 1990 to 2020. Our data-fusion framework is based on cluster analysis and statistical learning approaches, which intends to fuse the existing five products(Global Human Settlements Layer Population (GHS-POP), Global Rural Urban Mapping Project (GRUMP), Gridded Population of the World Version 4 (GPWv4), LandScan Population datasets and WorldPop datasets to a new continuous global gridded population (GlobPOP). The spatial validation results demonstrate that the GlobPOP dataset is highly accurate. To validate the temporal accuracy of GlobPOP at the country level, we have developed an interactive web application, accessible at https://globpop.shinyapps.io/GlobPOP/, where data users can explore the country-level population time-series curves of interest and compare them with census data.

With the availability of GlobPOP dataset in both population count and population density formats, researchers and policymakers can leverage our dataset to conduct time-series analysis of population and explore the spatial patterns of population development at various scales, ranging from national to city level.

Data description

The product is produced in 30 arc-seconds resolution(approximately 1km in equator) and is made available in GeoTIFF format. There are two population formats, one is the 'Count'(Population count per grid) and another is the 'Density'(Population count per square kilometer each grid)

Each GeoTIFF filename has 5 fields that are separated by an underscore "_". A filename extension follows these fields. The fields are described below with the example filename:

GlobPOP_Count_30arc_1990_I32

Field 1: GlobPOP(Global gridded population)
Field 2: Pixel unit is population "Count" or population "Density"
Field 3: Spatial resolution is 30 arc seconds
Field 4: Year "1990"
Field 5: Data type is I32(Int 32) or F32(Float32)

More information

Please refer to the paper for detailed information:

Liu, L., Cao, X., Li, S. et al. A 31-year (1990–2020) global gridded population dataset generated by cluster analysis and statistical learning. Sci Data 11, 124 (2024). https://doi.org/10.1038/s41597-024-02913-0.

The fully reproducible codes are publicly available at GitHub: https://github.com/lulingliu/GlobPOP.
Libya - Health
data.humdata.org
csv
Updated Feb 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank Group (2025). Libya - Health [Dataset]. https://data.humdata.org/dataset/world-bank-health-indicators-for-libya
Explore at:
csv(4782), csv(673353)Available download formats
Dataset updated
Feb 27, 2025
Dataset provided by
World Bankhttp://worldbank.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Libya
Description
Contains data from the World Bank's data portal. There is also a consolidated country dataset on HDX.

Improving health is central to the Millennium Development Goals, and the public sector is the main provider of health care in developing countries. To reduce inequities, many countries have emphasized primary health care, including immunization, sanitation, access to safe drinking water, and safe motherhood initiatives. Data here cover health systems, disease prevention, reproductive health, nutrition, and population dynamics. Data are from the United Nations Population Division, World Health Organization, United Nations Children's Fund, the Joint United Nations Programme on HIV/AIDS, and various other sources.
D
Post-COVID Conditions
data.cdc.gov
data.virginia.gov
+2more
application/rdfxml +5
Updated Oct 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NCHS/DHIS (2024). Post-COVID Conditions [Dataset]. https://data.cdc.gov/NCHS/Post-COVID-Conditions/gsea-w83j
Explore at:
application/rdfxml, csv, json, xml, tsv, application/rssxmlAvailable download formats
Dataset updated
Oct 4, 2024
Dataset authored and provided by
NCHS/DHIS
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
As part of an ongoing partnership with the Census Bureau, the National Center for Health Statistics (NCHS) recently added questions to assess the prevalence of post-COVID-19 conditions (long COVID), on the experimental Household Pulse Survey. This 20-minute online survey was designed to complement the ability of the federal statistical system to rapidly respond and provide relevant information about the impact of the coronavirus pandemic in the U.S. Data collection began on April 23, 2020. Beginning in Phase 3.5 (on June 1, 2022), NCHS included questions about the presence of symptoms of COVID that lasted three months or longer. Phase 3.5 will continue with a two-weeks on, two-weeks off collection and dissemination approach.

Estimates on this page are derived from the Household Pulse Survey and show the percentage of adults aged 18 and over who a) as a proportion of the U.S. population, the percentage of adults who EVER experienced post-COVID conditions (long COVID). These adults had COVID and had some symptoms that lasted three months or longer; b) as a proportion of adults who said they ever had COVID, the percentage who EVER experienced post-COVID conditions; c) as a proportion of the U.S. population, the percentage of adults who are CURRENTLY experiencing post-COVID conditions. These adults had COVID, had long-term symptoms, and are still experiencing symptoms; d) as a proportion of adults who said they ever had COVID, the percentage who are CURRENTLY experiencing post-COVID conditions; and e) as a proportion of the U.S. population, the percentage of adults who said they ever had COVID.
Data from: World Terrestrial Ecosystems
statsdemo-maps4stats.hub.arcgis.com
pacificgeoportal.com
+7more
Updated Apr 2, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2020). World Terrestrial Ecosystems [Dataset]. https://statsdemo-maps4stats.hub.arcgis.com/datasets/926a206393ec40a590d8caf29ae9a93e
Explore at:
Dataset updated
Apr 2, 2020
Dataset authored and provided by
Esrihttp://esri.com/
Area covered
Description
The World Terrestrial Ecosystems map classifies the world into areas of similar climate, landform, and land cover, which form the basic components of any terrestrial ecosystem structure. This map is important because it uses objectively derived and globally consistent data to characterize the ecosystems at a much finer spatial resolution (250-m) than existing ecoregionalizations, and a much finer thematic resolution (431 classes) than existing global land cover products. This item was updated on Apr 14, 2023 to distinguish between Boreal and Polar climate regions in the terrestrial ecosystems. Cell Size: 250-meter Source Type: ThematicPixel Type: 16 Bit UnsignedData Projection: GCS WGS84Extent: GlobalSource: USGS, The Nature Conservancy, EsriUpdate Cycle: NoneWhat can you do with this layer?This map allows you to query the land surface pixels and returns the values of all the input parameters (landform type, landcover/vegetation type, climate region) and the name of the terrestrial ecosystem at that location.This layer can be used in analysis at global and local regions. However, for large scale spatial analysis, we have also provided an ArcGIS Pro Package that contains the original raster data with multiple table attributes. For simple mapping applications, there is also a raster tile layer. This layer can be combined with the World Protected Areas Database to assess the types of ecosystems that are protected, and progress towards meeting conservation goals. The WDPA layer updates monthly from the United Nations Environment Programme.Developing the World Terrestrial EcosystemsWorld Terrestrial Ecosystems map was produced by adopting and modifying the Intergovernmental Panel on Climate Change (IPCC) approach on the definition of Terrestrial Ecosystems and development of standardized global climate regions using the values of environmental moisture regime and temperature regime. We then combined the values of Global Climate Regions, Landforms and matrix-forming vegetation assemblage or land use, using the ArcGIS Combine tool (Spatial Analyst) to produce World Ecosystems Dataset. This combination resulted of 431 World Ecosystems classes.Each combination was assigned a color using an algorithm that blended traditional color schemes for each of the three components. Every pixel in this map is symbolized by a combination of values for each of these fields.The work from this collaboration is documented in the publication:Sayre et al. 2020. An assessment of the representation of ecosystems in global protected areas using new maps of World Climate Regions and World Ecosystems - Global Ecology and Conservation More information about World Terrestrial Ecosystems can be found in this Story Map.
e
A global database of long-term changes in insect assemblages
knb.ecoinformatics.org
Updated Oct 1, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roel van Klink; Diana E. Bowler; Jonathan M. Chase; Orr Comay; Michael M. Driessen; S.K. Morgan Ernest; Alessandro Gentile; Francis Gilbert; Konstantin Gongalsky; Jennifer Owen; Guy Pe'er; Israel Pe'er; Vincent H. Resh; Ilia Rochlin; Sebastian Schuch; Ann E. Swengel; Scott R. Swengel; Thomas L. Valone; Rikjan Vermeulen; Tyson Wepprich; Jerome Wiedmann (2020). A global database of long-term changes in insect assemblages [Dataset]. http://doi.org/10.5063/F11V5C9V
Explore at:
Unique identifier
https://doi.org/10.5063/F11V5C9V
Dataset updated
Oct 1, 2020
Dataset provided by
Knowledge Network for Biocomplexity
Authors
Roel van Klink; Diana E. Bowler; Jonathan M. Chase; Orr Comay; Michael M. Driessen; S.K. Morgan Ernest; Alessandro Gentile; Francis Gilbert; Konstantin Gongalsky; Jennifer Owen; Guy Pe'er; Israel Pe'er; Vincent H. Resh; Ilia Rochlin; Sebastian Schuch; Ann E. Swengel; Scott R. Swengel; Thomas L. Valone; Rikjan Vermeulen; Tyson Wepprich; Jerome Wiedmann
Time period covered
Jan 1, 1925 - Jan 1, 2018
Area covered
Pacific Ocean, North Pacific Ocean
Variables measured
End, Link, Year, Realm, Start, CRUmnC, CRUmnK, Metric, Number, Period, and 62 more
Description
This data set under CC-BY license contains time series of total abundance and/or biomass of assemblages of insect, arachnid and Entognatha assemblages (grouped at the family level or higher taxonomic resolution), monitored by standardized means for ten or more years. The data were derived from 166 data sources, representing a total of 1676 sites from 41 countries. The time series for abundance and biomass represent the aggregated number of all individuals of all taxa monitored at each site. The data set consists of four linked tables, representing information on the study level, the plot level, about sampling, and the measured assemblage sizes. all references to the original data sources can be found in the pdf with references, and a Google Earth file (kml) file presents the locations (including metadata) of all datasets. When using (parts of) this data set, please respect the original open access licenses. This data set underlies all analyses performed in the paper 'Meta-analysis reveals declines in terrestrial, but increases in freshwater insect abundances', a meta-analysis of changes in insect assemblage sizes, and is accompanied by a data paper entitled 'InsectChange – a global database of temporal changes in insect and arachnid assemblages'. Consulting the data paper before use is recommended. Tables that can be used to calculate trends of specific taxa and for species richness will be added as they become available. The data set consists of four tables that are linked by the columns 'DataSource_ID'. and 'Plot_ID', and a table with references to original research. In the table 'DataSources', descriptive data is provided at the dataset level: Links are provided to online repositories where the original data can be found, it describes whether the dataset provides data on biomass, abundance or both, the invertebrate group under study, the realm, and describes the location of sampling at different geographic scales (continent to state). This table also contains a reference column. The full reference to the original data is found in the file 'References_to_original_data_sources.pdf'. In the table 'PlotData' more details on each site within each dataset are provided: there is data on the exact location of each plot, whether the plots were experimentally manipulated, and if there was any spatial grouping of sites (column 'Location'). Additionally, this table contains all explanatory variables used for analysis, e.g. climate change variables, land-use variables, protection status. The table 'SampleData' describes the exact source of the data (table X, figure X, etc), the extraction methods, as well as the sampling methods (derived from the original publications). This includes the sampling method, sampling area, sample size, and how the aggregation of samples was done, if reported. Also, any calculations we did on the original data (e.g. reverse log transformations) are detailed here, but more details are provided in the data paper. This table links to the table 'DataSources' by the column 'DataSource_ID'. Note that each datasource may contain multiple entries in the 'SampleData' table if the data were presented in different figures or tables, or if there was any other necessity to split information on sampling details. The table 'InsectAbundanceBiomassData' provides the insect abundance or biomass numbers as analysed in the paper. It contains columns matching to the tables 'DataSources' and 'PlotData', as well as year of sampling, a descriptor of the period within the year of sampling (this was used as a random effect), the unit in which the number is reported (abundance or biomass), and the estimated abundance or biomass. In the column for Number, missing data are included (NA). The years with missing data were added because this was essential for the analysis performed, and retained here because they are easier to remove than to add. Linking the table 'InsectAbundanceBiomassData.csv' with 'PlotData.csv' by column 'Plot_ID', and with 'DataSources.csv' by column 'DataSource_ID' will provide the full dataframe used for all analyses. Detailed explanations of all column headers and terms are available in the ReadMe file, and more details will be available in the forthcoming data paper. WARNING: Because of the disparate sampling methods and various spatial and temporal scales used to collect the original data, this dataset should never be used to test for differences in insect abundance/biomass among locations (i.e. differences in intercept). The data can only be used to study temporal trends, by testing for differences in slopes. The data are standardized within plots to allow the temporal comparison, but not necessarily among plots (even within one dataset).
Z
GeoPolHist dataset
data.niaid.nih.gov
zenodo.org
Updated Mar 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Girard (2021). GeoPolHist dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4600808
Explore at:
Dataset updated
Mar 12, 2021
Dataset provided by
Béatrice Dedinger
Paul Girard
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
GeoPolHist is a dataset that focuses on the questions “what is a country?” and “how many countries are there in the world?” Created from the lists of states and dependencies built by the Correlates of War project, GeoPolHist provides a dataset and visual documentation that identifies the political status of each of the geopolitical entities that existed in the world since 1816. It allows for an approach of the political history of the world based on the dichotomy between sovereign and non-sovereign entities.

This work was funded by the Fondation Del Duca.

Facebook

Twitter

Click to copy link

Link copied

Cite

Lubna Altarawneh (2023). Our World in Data COVID-19 Dataset [Dataset]. http://doi.org/10.21227/2n61-4965

Our World in Data COVID-19 Dataset

Explore at:

Unique identifier

https://doi.org/10.21227/2n61-4965

Dataset updated

Aug 16, 2023

Dataset provided by

IEEE Dataport

Authors

Lubna Altarawneh

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data that is updated throughout the duration of COVID-19. It includes information related to confirmed cases and deaths, hospitalization, intensive care unit admissions, testing for COVID-19, and vaccination for COVID-19.Confirmed cases and deaths: this data is collected from the World Health Organization Coronavirus Dashboard. The cases & deaths dataset is updated daily.Note 1: Time/date stamps reflect when the data was last updated by WHO. Due to the time required to process and validate the incoming data, there is a delay between reporting to WHO and the update of the dashboard.Note 2: Counts and corrections made after these times will be carried forward to the next reporting cycle for that specific region. Delayed reporting for any specific country, territory or area may result in pooled counts for multiple days being presented, with a retrospective update to counts on previous days to accurately reflect trends. Significant data errors detected or reported to WHO may be corrected at more frequent intervals.Hospitalizations and intensive care unit (ICU) admissions: our data is collected from official sources and collated by Our World in Data. The complete list of country-by-country sources is available here.Testing for COVID-19: this data is collected by the Our World in Data team from official reports; you can find further details in our post on COVID-19 testing, including our checklist of questions to understand testing data, information on geographical and temporal coverage, and detailed country-by-country source information. On 23 June 2022, we stopped adding new datapoints to our COVID-19 testing dataset. You can read more here.Vaccinations against COVID-19: this data is collected by the Our World in Data team from official reports.Other variables: this data is collected from a variety of sources (United Nations, World Bank, Global Burden of Disease, Blavatnik School of Government, etc.). More information is available in our codebook.

Clear search

Close search

Google apps

Main menu

Our World in Data COVID-19 Dataset

Education Attainment and Enrollment around the World - Dataset - Data...

Johns Hopkins COVID-19 Case Tracker

Updates

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

Queries

Interactive

Interactive Embed Code

Caveats

Attribution

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

RealVAD: A Real-world Dataset for Voice Activity Detection

Global Bilateral Migration Database

Worldwide Bureaucracy Indicators

Worldwide Bureaucracy Indicators

Data Dictionary

wwbi_data.csv

wwbi_series.csv

wwbi_country.csv

Data from: Comprehensive Global Database of Earthquake-Induced Landslide...

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Identification for Development (ID4D) Global Dataset

Geonames - All Cities with a population > 1000

Data from: World Database on Protected Areas

Public Health Official Departures

Changelog:

Overview

Findings

Methodology

Attribution

Is Data Missing?

India - Population Counts

GlobPOP: A 33-year (1990-2022) global gridded population dataset (Version...

Data Usage Notice

This version is not recommended for download. Please check the newest version.

Introduction

Data description

More information

Libya - Health

Post-COVID Conditions

Data from: World Terrestrial Ecosystems

A global database of long-term changes in insect assemblages

GeoPolHist dataset

Our World in Data COVID-19 DatasetSee More Versions

`wwbi_data.csv`

`wwbi_series.csv`

`wwbi_country.csv`

Our World in Data COVID-19 Dataset