This dataset contains global COVID-19 case and death data by country, collected directly from the official World Health Organization (WHO) COVID-19 Dashboard. It provides a comprehensive view of the pandemic’s impact worldwide, covering the period up to 2025. The dataset is intended for researchers, analysts, and anyone interested in understanding the progression and global effects of COVID-19 through reliable, up-to-date information.
The World Health Organization is the United Nations agency responsible for international public health. The WHO COVID-19 Dashboard is a trusted source that aggregates official reports from countries and territories around the world, providing daily updates on cases, deaths, and other key metrics related to COVID-19.
This dataset can be used for: - Tracking the spread and trends of COVID-19 globally and by country - Modeling and forecasting pandemic progression - Comparative analysis of the pandemic’s impact across countries and regions - Visualization and reporting
The data is sourced from the WHO, widely regarded as the most authoritative source for global health statistics. However, reporting practices and data completeness may vary by country and may be subject to revision as new information becomes available.
Special thanks to the WHO for making this data publicly available and to all those working to collect, verify, and report COVID-19 statistics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The total population in the United States was estimated at 341.2 million people in 2024, according to the latest census figures and projections from Trading Economics. This dataset provides - United States Population - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data is from:
https://simplemaps.com/data/world-cities
We're proud to offer a simple, accurate and up-to-date database of the world's cities and towns. We've built it from the ground up using authoritative sources such as the NGIA, US Geological Survey, US Census Bureau, and NASA.
Our database is:
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
If you know any further standard populations worth integrating in this dataset, please let me know in the discussion part. I would be happy to integrate further data to make this dataset more useful for everybody.
"Standard populations are "artificial populations" with fictitious age structures, that are used in age standardization as uniform basis for the calculation of comparable measures for the respective reference population(s).
Use: Age standardizations based on a standard population are often used at cancer registries to compare morbidity or mortality rates. If there are different age structures in populations of different regions or in a population in one region over time, the comparability of their mortality or morbidity rates is only limited. For interregional or inter-temporal comparisons, therefore, an age standardization is necessary. For this purpose the age structure of a reference population, the so-called standard population, is assumed for the study population. The age specific mortality or morbidity rates of the study population are weighted according to the age structure of the standard population. Selection of a standard population:
Which standard population is used for comparison basically, does not matter. It is important, however, that
The aim of this dataset is to provide a variety of the most commonly used 'standard populations'.
Currently, two files with 22 standard populations are provided: - standard_populations_20_age_groups.csv - 20 age groups: '0', '01-04', '05-09', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39', '40-44', '45-49', '50-54', '55-59', '60-64', '65-69', '70-74', '75-79', '80-84', '85-89', '90+' - 7 standard populations: 'Standard population Germany 2011', 'Standard population Germany 1987', 'Standard population of Europe 2013', 'Standard population Old Laender 1987', 'Standard population New Laender 1987', 'New standard population of Europe', 'World standard population' - source: German Federal Health Monitoring System
No restrictions are known to the author. Standard populations are published by different organisations for public usage.
How many people use social media?
Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.
Who uses social media?
Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.
How much time do people spend on social media?
Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.
What are the most popular social media platforms?
Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.
This dataset is a CSV format information that contains names of musical artist trending globally between May 30 -June 05, 2025 as recorded by Spotify along with their ranking. The other information such as the countries, cities and continent of the various musical artist residence and type of artist were manually filled up from personal research over the internet. It contains 200 rows and 6 columns.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for EMPLOYED PERSONS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.
The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Observations from iNaturalist.org, an online social network of people sharing biodiversity information to help each other learn about nature.
Observations included in this archive met the following requirements:
* Published under one of the following licenses or waivers: 1) https://creativecommons.org/publicdomain/zero/1.0/, 2) https://creativecommons.org/licenses/by/4.0/, 3) https://creativecommons.org/licenses/by-nc/4.0/
* Achieved one of following iNaturalist quality grades: Research
* Created on or before 2025-08-19 15:00:21 -0700
You can view observations meeting these requirements at https://www.inaturalist.org/observations?created_d2=2025-08-19+15%3A00%3A21+-0700&d1=1600-01-01&license=CC0%2CCC-BY%2CCC-BY-NC&quality_grade=research
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
https://i.imgur.com/5rtbtpN.png" alt="Imgur">
The original Palmer's Penguins dataset is an invaluable resource in the world of data science, often used for statistical analysis, data visualization, and introductory machine learning tasks. Collected in the Palmer Archipelago near Antarctica, the dataset provides information on three species of penguins, including Adélie, Gentoo, and Chinstrap, and covers essential biological metrics such as bill dimensions and body mass.
Our extended dataset aims to build upon this foundational work by incorporating new, realistic features. We have included additional variables like diet, year of observation, life stage, and health metrics. These extra features allow for a more nuanced understanding of penguin biology and ecology, making it ideal for more complex analyses, including but not limited to educational, ecological, and advanced machine learning applications.
The dataset consists of the following columns:
The inclusion of yearly data from 2021 to 2025 allows for longitudinal studies, providing a temporal dimension that can help track the impact of climate change, dietary shifts, or other ecological factors on penguin populations over time.
We introduce the 'Health Metrics' column, which takes into account the body mass, life stage, and species to categorize each penguin's health status. This provides a multi-faceted view of individual well-being and can be crucial for conservation studies.
Our data structure enables the mapping of the diet to specific life stages, offering a granular understanding of penguin ecology. This added detail can be crucial for studying nutritional needs at different life stages.
Recognizing the importance of gender-based variations in penguin biology, our dataset incorporates attributes that allow for the study of sexual dimorphism, such as differing body sizes and potential diet variations between males and females.
This enriched dataset is particularly suitable for: - Advanced ecological models that require multiple layers of data. - Educational case studies focusing on biology, ecology, or data science. - Data-driven conservation efforts aimed at penguin species. - Machine learning algorithms that benefit from diverse and multi-dimensional data.
We wish to express our deepest respect and acknowledgment to the original research team behind the Palmer's Penguins dataset. This Extended Palmer's Penguins dataset is designed to build upon the solid foundation laid by the original work. It is created to serve as a complementary resource that adds additional dimensions for research and educational purposes. In no way is this artificial dataset intended to discredit or disrespect the invaluable contributions made through the original dataset.
All illustrations in this dataset are AI-generated.
https://i.imgur.com/yzroo3h.png" alt="Imgur">
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for CORONAVIRUS DEATHS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
This dataset contains counts of deaths for California as a whole based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.
The final data tables include both deaths that occurred in California regardless of the place of residence (by occurrence) and deaths to California residents (by residence), whereas the provisional data table only includes deaths that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Apple is one of the most influential and recognisable brands in the world, responsible for the rise of the smartphone with the iPhone. Valued at over $2 trillion in 2021, it is also the most valuable...
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Credit risk is the probability of a financial loss resulting from a borrower's failure to repay a loan. Essentially, credit risk refers to the risk that a lender may not receive the owed principal and interest, which results in an interruption of cash flows and increased costs for collection.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10074224%2Fe3f381f2b16279a11f8a88975c643fb3%2FIndias-largest-bank-HDFC-Bank-has-climbed-back-to-the-top-ten-banks-in-the-world-in-terms-of-market-capitalization.jpg?generation=1746093705818402&alt=media" alt="">
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This starter data kit collects extracts from global, open datasets relating to climate hazards and infrastructure systems.
These extracts are derived from global datasets which have been clipped to the national scale (or subnational, in cases where national boundaries have been split, generally to separate outlying islands or non-contiguous regions), using Natural Earth (2023) boundaries, and is not meant to express an opinion about borders, territory or sovereignty.
Human-induced climate change is increasing the frequency and severity of climate and weather extremes. This is causing widespread, adverse impacts to societies, economies and infrastructures. Climate risk analysis is essential to inform policy decisions aimed at reducing risk. Yet, access to data is often a barrier, particularly in low and middle-income countries. Data are often scattered, hard to find, in formats that are difficult to use or requiring considerable technical expertise. Nevertheless, there are global, open datasets which provide some information about climate hazards, society, infrastructure and the economy. This "data starter kit" aims to kickstart the process and act as a starting point for further model development and scenario analysis.
Hazards:
Exposure:
Contextual information:
The spatial intersection of hazard and exposure datasets is a first step to analyse vulnerability and risk to infrastructure and people.
To learn more about related concepts, there is a free short course available through the Open University on Infrastructure and Climate Resilience. This overview of the course has more details.
These Python libraries may be a useful place to start analysis of the data in the packages produced by this workflow:
snkit
helps clean network data
nismod-snail
is designed to help implement infrastructure
exposure, damage and risk calculations
The open-gira
repository contains a larger workflow for global-scale open-data infrastructure risk and resilience analysis.
For a more developed example, some of these datasets were key inputs to a regional climate risk assessment of current and future flooding risks to transport networks in East Africa, which has a related online visualisation tool at https://east-africa.infrastructureresilience.org/ and is described in detail in Hickford et al (2023).
References
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This starter data kit collects extracts from global, open datasets relating to climate hazards and infrastructure systems.
These extracts are derived from global datasets which have been clipped to the national scale (or subnational, in cases where national boundaries have been split, generally to separate outlying islands or non-contiguous regions), using Natural Earth (2023) boundaries, and is not meant to express an opinion about borders, territory or sovereignty.
Human-induced climate change is increasing the frequency and severity of climate and weather extremes. This is causing widespread, adverse impacts to societies, economies and infrastructures. Climate risk analysis is essential to inform policy decisions aimed at reducing risk. Yet, access to data is often a barrier, particularly in low and middle-income countries. Data are often scattered, hard to find, in formats that are difficult to use or requiring considerable technical expertise. Nevertheless, there are global, open datasets which provide some information about climate hazards, society, infrastructure and the economy. This "data starter kit" aims to kickstart the process and act as a starting point for further model development and scenario analysis.
Hazards:
Exposure:
Contextual information:
The spatial intersection of hazard and exposure datasets is a first step to analyse vulnerability and risk to infrastructure and people.
To learn more about related concepts, there is a free short course available through the Open University on Infrastructure and Climate Resilience. This overview of the course has more details.
These Python libraries may be a useful place to start analysis of the data in the packages produced by this workflow:
snkit
helps clean network data
nismod-snail
is designed to help implement infrastructure
exposure, damage and risk calculations
The open-gira
repository contains a larger workflow for global-scale open-data infrastructure risk and resilience analysis.
For a more developed example, some of these datasets were key inputs to a regional climate risk assessment of current and future flooding risks to transport networks in East Africa, which has a related online visualisation tool at https://east-africa.infrastructureresilience.org/ and is described in detail in Hickford et al (2023).
References
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This starter data kit collects extracts from global, open datasets relating to climate hazards and infrastructure systems.
These extracts are derived from global datasets which have been clipped to the national scale (or subnational, in cases where national boundaries have been split, generally to separate outlying islands or non-contiguous regions), using Natural Earth (2023) boundaries, and is not meant to express an opinion about borders, territory or sovereignty.
Human-induced climate change is increasing the frequency and severity of climate and weather extremes. This is causing widespread, adverse impacts to societies, economies and infrastructures. Climate risk analysis is essential to inform policy decisions aimed at reducing risk. Yet, access to data is often a barrier, particularly in low and middle-income countries. Data are often scattered, hard to find, in formats that are difficult to use or requiring considerable technical expertise. Nevertheless, there are global, open datasets which provide some information about climate hazards, society, infrastructure and the economy. This "data starter kit" aims to kickstart the process and act as a starting point for further model development and scenario analysis.
Hazards:
Exposure:
Contextual information:
The spatial intersection of hazard and exposure datasets is a first step to analyse vulnerability and risk to infrastructure and people.
To learn more about related concepts, there is a free short course available through the Open University on Infrastructure and Climate Resilience. This overview of the course has more details.
These Python libraries may be a useful place to start analysis of the data in the packages produced by this workflow:
snkit
helps clean network data
nismod-snail
is designed to help implement infrastructure
exposure, damage and risk calculations
The open-gira
repository contains a larger workflow for global-scale open-data infrastructure risk and resilience analysis.
For a more developed example, some of these datasets were key inputs to a regional climate risk assessment of current and future flooding risks to transport networks in East Africa, which has a related online visualisation tool at https://east-africa.infrastructureresilience.org/ and is described in detail in Hickford et al (2023).
References
The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
This dataset contains global COVID-19 case and death data by country, collected directly from the official World Health Organization (WHO) COVID-19 Dashboard. It provides a comprehensive view of the pandemic’s impact worldwide, covering the period up to 2025. The dataset is intended for researchers, analysts, and anyone interested in understanding the progression and global effects of COVID-19 through reliable, up-to-date information.
The World Health Organization is the United Nations agency responsible for international public health. The WHO COVID-19 Dashboard is a trusted source that aggregates official reports from countries and territories around the world, providing daily updates on cases, deaths, and other key metrics related to COVID-19.
This dataset can be used for: - Tracking the spread and trends of COVID-19 globally and by country - Modeling and forecasting pandemic progression - Comparative analysis of the pandemic’s impact across countries and regions - Visualization and reporting
The data is sourced from the WHO, widely regarded as the most authoritative source for global health statistics. However, reporting practices and data completeness may vary by country and may be subject to revision as new information becomes available.
Special thanks to the WHO for making this data publicly available and to all those working to collect, verify, and report COVID-19 statistics.