Learning Web Scraping in order to build my own datasets, and this is the first one in the learning process. Let's try and build great datasets in the future for better analysis and predictions.
Scraped the data on March 10, 2020, from https://www.worldometers.info/world-population/population-by-country/ Dataset represents the population count country-wise for a specific time period.
Firstly, Thanks to the Content creator on the website https://www.worldometers.info, who provides reliable data on the internet. Secondly, To the Tutor who taught me how to scrape websites.
Is this dataset valuable? Where can we utilize this dataset in data science?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Population by Country - 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tanuprabhu/population-by-country-2020 on 28 January 2022.
--- Dataset description provided by original source is as follows ---
I always wanted to access a data set that was related to the world’s population (Country wise). But I could not find a properly documented data set. Rather, I just created one manually.
Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Population of countries) on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. And also there were only I think 190 or more countries. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with their population.
Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.
https://fiverr-res.cloudinary.com/images/t_main1,q_auto,f_auto/gigs/119580480/original/68088c5f588ec32a6b3a3a67ec0d1b5a8a70648d/do-web-scraping-and-data-mining-with-python.png" alt="alt text">
You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.
Below is the code that I used to scrape the code from the website
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2Fe814c2739b99d221de328c72a0b2571e%2FCapture.PNG?generation=1581314967227445&alt=media" alt="">
Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data.
As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
There's a story behind every dataset and here's your opportunity to share yours.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research. Credits and Information Taken by https://www.worldometers.info/world-population/
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset, titled "Global COVID-19 Statistics - Jan 2025," contains the latest COVID-19 statistics collected from the Worldometer website on Jan 09, 2025. The data includes crucial metrics such as the total number of cases, deaths, recoveries, and active cases for countries around the world. The information is extracted from the comprehensive table provided by Worldometer, which is widely regarded as a reliable source for real-time coronavirus statistics. Source and Collection Date Source: Worldometer Coronavirus Page Date of Collection: Jan 09, 2025
As the world is fighting against this invisible enemy a lot of data-driven students like me want to study it as well as we can. There is an enormous number of data set available on covid19 today but as a beginner, in this field, I wanted to find some more simple data. So here I come up with this covid19 data set which I scrapped from "https://www.worldometers.info/coronavirus". It is my way of learning by doing. This data is till 5/17/2020. I will keep on updating it.
The dataset contains 194 rows and 12 columns which are described below:-
Country: Contains the name of all Countries. Total_Cases: It contains the total number of cases the country has till 5/17/2020. Total_Deaths: Total number of deaths in that country till 5/17/2020. Total_Recovered: Total number of individuals recovered from covid19. Active_Cases: Total active cases in the country on 5/17/2020. Critical_Cases: Number of patients in critical condition. Cases/Million_Population: Number of cases per million population of that country. Deaths/Million_Population: Number of deaths per million population of that country. Total_Tests: Total number of tests performed 5/17/2020 Tests/Million_Population: Number of tests performed per million population. Population: Population of the country Continent: Continent in which the country lies.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Africa Population (Live) counter shows a continuously updated estimate of the current population of Africa delivered by Worldometer's RTS algorithm, which processes data collected from the United Nations Population Division. From https://www.worldometers.info/world-population/africa-population/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Covid19 in World Countries-Latest Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/anandhuh/covid19-in-world-countrieslatest-data on 12 November 2021.
--- Dataset description provided by original source is as follows ---
This dataset contains Covid-19 data of world countries as on November 10, 2021
Link : https://www.worldometers.info/coronavirus/#countries
Link : https://www.kaggle.com/anandhuh/datasets
Upvote if you find it useful 🙏
--- Original source retains full ownership of the source dataset ---
The 2019–20 coronavirus pandemic is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The virus first emerged in Wuhan, Hubei, China, in December 2019. On 11 March 2020, the World Health Organization declared the outbreak a pandemic. As of 11 March 2020, over 126,000 cases have been confirmed in more than 110 countries and territories, with major outbreaks in mainland China, Italy, South Korea, and Iran. More than 4,600 have died from the disease and 67,000 have recovered.
2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC
This dataset has information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this data was scrapped from https://www.worldometers.info/coronavirus/.This data is solely for education purposes only.
This data is solely belongs to https://www.worldometers.info/coronavirus/. for licensing visit https://www.worldometers.info/licensing/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Estimated population data based on the latest United Nations Population Division estimates and http://www.worldometers.info/world-population/population-by-country/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Population of Spain (1955-2020) with predictions until 2050. This dataset is created using Web Scraping technics on webpage: https://www.worldometers.info/world-population/spain-population/.
Exercise for Universitat Oberta de Catalunya. Subject: M2.851 Tipología y ciclo de vida de los datos.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Forecast Dataset of COVID19 from World Health Organization (WHO), Worldometers and Ministry of Health of Brazil)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Associated with manuscript titled: Fifty Muslim-majority countries have fewer COVID-19 cases and deaths than the 50 richest non-Muslim countriesThe objective of this research was to determine the difference in the total number of COVID-19 cases and deaths between Muslim-majority and non-Muslim countries, and investigate reasons for the disparities. Methods: The 50 Muslim-majority countries had more than 50.0% Muslims with an average of 87.5%. The non-Muslim country sample consisted of 50 countries with the highest GDP while omitting any Muslim-majority countries listed. The non-Muslim countries’ average percentage of Muslims was 4.7%. Data pulled on September 18, 2020 included the percentage of Muslim population per country by World Population Review15 and GDP per country, population count, and total number of COVID-19 cases and deaths by Worldometers.16 The data set was transferred via an Excel spreadsheet on September 23, 2020 and analyzed. To measure COVID-19’s incidence in the countries, three different Average Treatment Methods (ATE) were used to validate the results. Results published as a preprint at https://doi.org/10.31235/osf.io/84zq5(15) Muslim Majority Countries 2020 [Internet]. Walnut (CA): World Population Review. 2020- [Cited 2020 Sept 28]. Available from: http://worldpopulationreview.com/country-rankings/muslim-majority-countries (16) Worldometers.info. Worldometer. Dover (DE): Worldometer; 2020 [cited 2020 Sept 28]. Available from: http://worldometers.info
Late in December 2019, the World Health Organisation (WHO) China Country Office obtained information about severe pneumonia of an unknown cause, detected in the city of Wuhan in Hubei province, China. This later turned out to be the novel coronavirus disease (COVID-19), an infectious disease caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) of the coronavirus family. The disease causes respiratory illness characterized by primary symptoms like cough, fever, and in more acute cases, difficulty in breathing. WHO later declared COVID-19 as a Pandemic because of its fast rate of spread across the Globe.
The COVID-19 datasets organized by continent contain daily level information about the COVID-19 cases in the different continents of the world. It is a time-series data and the number of cases on any given day is cumulative. The original datasets can be found on this John Hopkins University Github repository. I will be updating the COVID-19 datasets on a daily basis, with every update from John Hopkins University. I have also included the World COVID-19 tests data scraped from Worldometer and 2020 world population also scraped from worldometer.
COVID-19 cases
covid19_world.csv
. It contains the cumulative number of COVID-19 cases from around the world since January 22, 2020, as compiled by John Hopkins University.
covid19_asia.csv
, covid19_africa.csv
, covid19_europe.csv
, covid19_northamerica.csv
, covid19.southamerica.csv
, covid19_oceania.csv
, and covid19_others.csv
. These contain the cumulative number of COVID-19 cases organized by the continent.
Field description - ObservationDate: Date of observation in YY/MM/DD - Country_Region: name of Country or Region - Province_State: name of Province or State - Confirmed: the number of COVID-19 confirmed cases - Deaths: the number of deaths from COVID-19 - Recovered: the number of recovered cases - Active: the number of people still infected with COVID-19 Note: Active = Confirmed - (Deaths + Recovered)
COVID-19 tests
covid19_tests.csv
. It contains the cumulative number of COVID tests data from worldometer conducted since the onset of the pandemic. Data available from June 01, 2020.
Field description Date: date in YY/MM/DD Country, Other: Country, Region, or dependency TotalTests: cumulative number of tests up till that date Population: population of Country, Region, or dependency Tests/1M pop: tests per 1 million of the population 1 Testevery X ppl: 1 test for every X number of people
2020 world population
world_population(2020).csv
. It contains the 2020 world population as reported by woldometer.
Field description Country (or dependency): Country or dependency Population (2020): population in 2020 Yearly Change: yearly change in population as a percentage Net Change: the net change in population Density(P/km2): population density Land Area(km2): land area Migrants(net): net number of migrants Fert. Rate: Fertility Rate Med. Age: median age Urban pop: urban population World Share: share of the world population as a percentage
Many places around the world have experienced population growth in the past decade and even population decline due to the COVID pandemic. According to worldometer’s current statistics the global population continues to thrive reaching a little over 8 billion and still growing. Although, Kazakhstan only ranks 64 we can see that they have a decent 1.21 percent yearly change with the net change being about 225,000 to the total of 19 million. When we look at their 2021 stats from Our World in Data for birth rates and death rates per 1,000 people, we can see that they are still a growing population as the birth rate (21.54) is double the death rate (10.23). Birthrates measure the number of births in a population by using a percentage or a ratio per 1,000 people and Death rates measure using the same methods (Marston, Knox, Liverman, Del Casino, Robbins, 2019, p. 39). Not only does this contribute to the growing population, but groups of people who weren’t living there whose ethnicity is from Kazakhstan are moving back into their home country. Ethnicity is defined as a “state of belonging to a social group that has a common national or cultural tradition; socially created system of rules about who belongs to a particular group” (Marston, Knox, Liverman, Del Casino, Robbins, 2019, p. 36). Population growth isn’t necessarily a bad thing as long as it is sustainable, but for Kazakhstan population growth can be dangerous as generally they have been struggling with basic economic rights and are being directed to the northern region.
On March 10, 2023, the Johns Hopkins Coronavirus Resource Center ceased collecting and reporting of global COVID-19 data. For updated cases, deaths, and vaccine data please visit the following sources:Global: World Health Organization (WHO)U.S.: U.S. Centers for Disease Control and Prevention (CDC)For more information, visit the Johns Hopkins Coronavirus Resource Center.This feature layer contains the most up-to-date COVID-19 cases and latest trend plot. It covers China, Canada, Australia (at province/state level), and the rest of the world (at country level, represented by either the country centroids or their capitals)and the US at county-level. Data sources: WHO, CDC, ECDC, NHC, DXY, 1point3acres, Worldometers.info, BNO, state and national government health departments, and local media reports. . The China data is automatically updating at least once per hour, and non-China data is updating hourly. This layer is created and maintained by the Center for Systems Science and Engineering (CSSE) at the Johns Hopkins University. This feature layer is supported by Esri Living Atlas team and JHU Data Services. This layer is opened to the public and free to share. Contact us.
The countries with the lowest life expectancy worldwide include the Nigeria, Chad, and Lesotho. As of 2023, people born in Nigeria could be expected to live only up to ** years. This is almost ** years shorter than the global life expectancy. Life expectancy The global life expectancy has gradually increased over the past couple decades, rising from **** years in 2011 to **** years in 2023. However, the years 2020 and 2021 saw a decrease in global life expectancy due to the COVID-19 pandemic. Furthermore, life expectancy can vary greatly depending on the country and region. For example, all the top 20 countries with the lowest life expectancy worldwide are in Africa. The countries with the highest life expectancy include Liechtenstein, Switzerland, and Japan. Causes of death The countries with the lowest life expectancy worldwide are all low-income or developing countries that lack health care access and treatment that more developed countries can provide. The leading causes of death in these countries therefore differ from those of middle-income and upper-income countries. The leading causes of death in low-income countries include diseases such as HIV/AIDS and malaria, as well as preterm birth complications, which do not cause substantial death in higher income countries.
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, the global alcohol-based hand sanitizer market size is USD 2351.2 million in 2023 and will expand at a compound annual growth rate (CAGR) of 3.60% from 2023 to 2030.
North America held the major market of more than 40% of the global revenue with a market size of USD 940.5 million in 2023 and will grow at a compound annual growth rate (CAGR) of 1.8% from 2023 to 2030
Europe accounted for a share of over 30% of the global market size of USD 705.4 million
Asia Pacific held the market of more than 23% of the global revenue with a market size of USD 540.8 million in 2023 and will grow at a compound annual growth rate (CAGR) of 5.6% from 2023 to 2030
Latin America market of more than 5% of the global revenue with a market size of USD 117.6 million in 2023 and will grow at a compound annual growth rate (CAGR) of 3.0% from 2023 to 2030
Middle East and Africa held the major market of more than 2% of the global revenue with a market size of USD 47.02 million in 2023 and will grow at a compound annual growth rate (CAGR) of 3.3% from 2023 to 2030
Enhanced Focus on Hand Sanitization to Provide Viable Market Output
Consumer behavior has been significantly impacted by the global coronavirus outbreak, which has also encouraged consumers to improve their personal hygiene, especially their hand hygiene.
As of February 23, 2022, approximately 43 million individuals worldwide have been infected by the coronavirus, with 6.5 million cases still active and 0.59 million deaths recorded, according to Worldometer.
Source-www.worldometers.info/coronavirus/coronavirus-death-toll/
France, Russia, the United States, and the United Kingdom are the nations most badly impacted. As a result, customers became alarmed by the rising number of virus-related deaths and began paying more attention to hand hygiene as a defense against getting sick. The World Health Organization, the Centers for Disease Control and Prevention, and medical professionals everywhere advise using hand sanitizers as well. They assert that applying an alcohol-based hand rub is one of the best defenses against the virus. The alcohol-based hand sanitizer market is currently growing because of this factor.
Increasing Consciousness and Governmental Efforts to Propel Market Growth
The public's increasing awareness of the importance of hand hygiene, sparked by government and health organization campaigns, is driving a notable increase in the alcohol-based hand sanitizer industry. Consumer demand for alcohol-based hand sanitizer has surged as a result of awareness of the product's critical role in stopping the transmission of infectious diseases. The market has had significant effects from the COVID-19 pandemic. The virus is extremely contagious, thus there is an immediate need for strong disinfection procedures. The alcohol-based hand sanitizer have become a popular and practical answer to this problem. Continuous market expansion is the outcome of the pandemic's indelible habit of alcohol-based hand sanitizer use in daily routines.
Key Dynamics of
Alcohol based Hand Sanitizer Market
Key Drivers of
Alcohol based Hand Sanitizer Market
Heightened Hygiene Awareness Following the Pandemic: The COVID-19 pandemic has profoundly altered consumer habits, establishing hand hygiene as a lasting priority in homes, workplaces, and public areas. Even after the pandemic, the consistent use of hand sanitizers has become ingrained in both personal and institutional practices. Alcohol-based hand sanitizers are especially favored due to their demonstrated efficacy in eliminating 99.9% of bacteria and viruses. Health organizations such as the WHO and CDC advocate for a minimum of 60% alcohol content in sanitizers, further supporting their utilization.
Increasing Utilization in Healthcare and Commercial Settings: Hospitals, clinics, laboratories, food service sectors, and corporate offices are adopting alcohol-based sanitizers as vital tools for infection control. Hand sanitizing stations have become a common feature in commercial buildings, transportation hubs, educational institutions, and retail centers. Institutional purchasers generally buy in bulk and favor alcohol-based formulations for their rapid action and comprehensive germ protection.
Robust Product Availability Across Distribution Channels: The extensive availability of alco...
The dataset contains COVID-19 statistics for the top countries currently affected by the virus. The data was scraped from two popular sites maintaining daily updates on the spread of COVID-19 - https://www.worldometers.info/ and https://en.wikipedia.org/wiki/COVID-19_pandemic
There are two kinds of csv files. One type of files are country wise daily statistics on COVID-19 spread. The data for the following countries is available:-
For each of these countries, the dataset contains the following columns:-
The second type of file is the overall statistics which contains statistics for all the countries affected in the world. This dataset contains the following columns:-
JHU Coronavirus COVID-19 Global Cases, by country
PHS is updating the Coronavirus Global Cases dataset weekly, Monday, Wednesday and Friday from Cloud Marketplace.
This data comes from the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). This database was created in response to the Coronavirus public health emergency to track reported cases in real-time. The data include the location and number of confirmed COVID-19 cases, deaths, and recoveries for all affected countries, aggregated at the appropriate province or state. It was developed to enable researchers, public health authorities and the general public to track the outbreak as it unfolds. Additional information is available in the blog post.
Visual Dashboard (desktop): https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
Included Data Sources are:
%3C!-- --%3E
**Terms of Use: **
This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.
**U.S. county-level characteristics relevant to COVID-19 **
Chin, Kahn, Krieger, Buckee, Balsari and Kiang (forthcoming) show that counties differ significantly in biological, demographic and socioeconomic factors that are associated with COVID-19 vulnerability. A range of publicly available county-specific data identifying these key factors, guided by international experiences and consideration of epidemiological parameters of importance, have been combined by the authors and are available for use:
Based on a comparison of coronavirus deaths in 210 countries relative to their population, Peru had the most losses to COVID-19 up until July 13, 2022. As of the same date, the virus had infected over 557.8 million people worldwide, and the number of deaths had totaled more than 6.3 million. Note, however, that COVID-19 test rates can vary per country. Additionally, big differences show up between countries when combining the number of deaths against confirmed COVID-19 cases. The source seemingly does not differentiate between "the Wuhan strain" (2019-nCOV) of COVID-19, "the Kent mutation" (B.1.1.7) that appeared in the UK in late 2020, the 2021 Delta variant (B.1.617.2) from India or the Omicron variant (B.1.1.529) from South Africa.
The difficulties of death figures
This table aims to provide a complete picture on the topic, but it very much relies on data that has become more difficult to compare. As the coronavirus pandemic developed across the world, countries already used different methods to count fatalities, and they sometimes changed them during the course of the pandemic. On April 16, for example, the Chinese city of Wuhan added a 50 percent increase in their death figures to account for community deaths. These deaths occurred outside of hospitals and went unaccounted for so far. The state of New York did something similar two days before, revising their figures with 3,700 new deaths as they started to include “assumed” coronavirus victims. The United Kingdom started counting deaths in care homes and private households on April 29, adjusting their number with about 5,000 new deaths (which were corrected lowered again by the same amount on August 18). This makes an already difficult comparison even more difficult. Belgium, for example, counts suspected coronavirus deaths in their figures, whereas other countries have not done that (yet). This means two things. First, it could have a big impact on both current as well as future figures. On April 16 already, UK health experts stated that if their numbers were corrected for community deaths like in Wuhan, the UK number would change from 205 to “above 300”. This is exactly what happened two weeks later. Second, it is difficult to pinpoint exactly which countries already have “revised” numbers (like Belgium, Wuhan or New York) and which ones do not. One work-around could be to look at (freely accessible) timelines that track the reported daily increase of deaths in certain countries. Several of these are available on our platform, such as for Belgium, Italy and Sweden. A sudden large increase might be an indicator that the domestic sources changed their methodology.
Where are these numbers coming from?
The numbers shown here were collected by Johns Hopkins University, a source that manually checks the data with domestic health authorities. For the majority of countries, this is from national authorities. In some cases, like China, the United States, Canada or Australia, city reports or other various state authorities were consulted. In this statistic, these separately reported numbers were put together. For more information or other freely accessible content, please visit our dedicated Facts and Figures page.
Learning Web Scraping in order to build my own datasets, and this is the first one in the learning process. Let's try and build great datasets in the future for better analysis and predictions.
Scraped the data on March 10, 2020, from https://www.worldometers.info/world-population/population-by-country/ Dataset represents the population count country-wise for a specific time period.
Firstly, Thanks to the Content creator on the website https://www.worldometers.info, who provides reliable data on the internet. Secondly, To the Tutor who taught me how to scrape websites.
Is this dataset valuable? Where can we utilize this dataset in data science?