Learning Web Scraping in order to build my own datasets, and this is the first one in the learning process. Let's try and build great datasets in the future for better analysis and predictions.
Scraped the data on March 10, 2020, from https://www.worldometers.info/world-population/population-by-country/ Dataset represents the population count country-wise for a specific time period.
Firstly, Thanks to the Content creator on the website https://www.worldometers.info, who provides reliable data on the internet. Secondly, To the Tutor who taught me how to scrape websites.
Is this dataset valuable? Where can we utilize this dataset in data science?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Population by Country - 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tanuprabhu/population-by-country-2020 on 28 January 2022.
--- Dataset description provided by original source is as follows ---
I always wanted to access a data set that was related to the world’s population (Country wise). But I could not find a properly documented data set. Rather, I just created one manually.
Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Population of countries) on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. And also there were only I think 190 or more countries. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with their population.
Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.
https://fiverr-res.cloudinary.com/images/t_main1,q_auto,f_auto/gigs/119580480/original/68088c5f588ec32a6b3a3a67ec0d1b5a8a70648d/do-web-scraping-and-data-mining-with-python.png" alt="alt text">
You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.
Below is the code that I used to scrape the code from the website
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2Fe814c2739b99d221de328c72a0b2571e%2FCapture.PNG?generation=1581314967227445&alt=media" alt="">
Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data.
As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
There's a story behind every dataset and here's your opportunity to share yours.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research. Credits and Information Taken by https://www.worldometers.info/world-population/
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘World Population by Year’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sansuthi/world-population-by-year on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Source of content: www.worldometers.info
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Population of Spain (1955-2020) with predictions until 2050. This dataset is created using Web Scraping technics on webpage: https://www.worldometers.info/world-population/spain-population/.
Exercise for Universitat Oberta de Catalunya. Subject: M2.851 Tipología y ciclo de vida de los datos.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Africa Population (Live) counter shows a continuously updated estimate of the current population of Africa delivered by Worldometer's RTS algorithm, which processes data collected from the United Nations Population Division. From https://www.worldometers.info/world-population/africa-population/
Many places around the world have experienced population growth in the past decade and even population decline due to the COVID pandemic. According to worldometer’s current statistics the global population continues to thrive reaching a little over 8 billion and still growing. Although, Kazakhstan only ranks 64 we can see that they have a decent 1.21 percent yearly change with the net change being about 225,000 to the total of 19 million. When we look at their 2021 stats from Our World in Data for birth rates and death rates per 1,000 people, we can see that they are still a growing population as the birth rate (21.54) is double the death rate (10.23). Birthrates measure the number of births in a population by using a percentage or a ratio per 1,000 people and Death rates measure using the same methods (Marston, Knox, Liverman, Del Casino, Robbins, 2019, p. 39). Not only does this contribute to the growing population, but groups of people who weren’t living there whose ethnicity is from Kazakhstan are moving back into their home country. Ethnicity is defined as a “state of belonging to a social group that has a common national or cultural tradition; socially created system of rules about who belongs to a particular group” (Marston, Knox, Liverman, Del Casino, Robbins, 2019, p. 36). Population growth isn’t necessarily a bad thing as long as it is sustainable, but for Kazakhstan population growth can be dangerous as generally they have been struggling with basic economic rights and are being directed to the northern region.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Estimated population data based on the latest United Nations Population Division estimates and http://www.worldometers.info/world-population/population-by-country/
Late in December 2019, the World Health Organisation (WHO) China Country Office obtained information about severe pneumonia of an unknown cause, detected in the city of Wuhan in Hubei province, China. This later turned out to be the novel coronavirus disease (COVID-19), an infectious disease caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) of the coronavirus family. The disease causes respiratory illness characterized by primary symptoms like cough, fever, and in more acute cases, difficulty in breathing. WHO later declared COVID-19 as a Pandemic because of its fast rate of spread across the Globe.
The COVID-19 datasets organized by continent contain daily level information about the COVID-19 cases in the different continents of the world. It is a time-series data and the number of cases on any given day is cumulative. The original datasets can be found on this John Hopkins University Github repository. I will be updating the COVID-19 datasets on a regular basis with every update from John Hopkins University. I have also included the World COVID-19 tests data scraped from Worldometer and 2020 world population also scraped from worldometer.
COVID-19 cases
covid19_world.csv
. It contains the cumulative number of COVID-19 cases from around the world since January 22, 2020, as compiled by John Hopkins University.
covid19_asia.csv
, covid19_africa.csv
, covid19_europe.csv
, covid19_northamerica.csv
, covid19.southamerica.csv
, covid19_oceania.csv
, and covid19_others.csv
. These contain the cumulative number of COVID-19 cases organized by the continent.
Field description - ObservationDate: Date of observation in YY/MM/DD - Country_Region: name of Country or Region - Province_State: name of Province or State - Confirmed: the number of COVID-19 confirmed cases - Deaths: the number of deaths from COVID-19 - Recovered: the number of recovered cases - Active: the number of people still infected with COVID-19 Note: Active = Confirmed - (Deaths + Recovered)
COVID-19 tests
covid19_tests.csv
. It contains the cumulative number of COVID tests data from worldometer conducted since the onset of the pandemic. Data available from June 01, 2020.
Field description Date: date in YY/MM/DD Country, Other: Country, Region, or dependency TotalTests: cumulative number of tests up till that date Population: population of Country, Region, or dependency Tests/1M pop: tests per 1 million of the population 1 Testevery X ppl: 1 test for every X number of people
2020 world population
world_population(2020).csv
. It contains the 2020 world population as reported by woldometer.
Field description Country (or dependency): Country or dependency Population (2020): population in 2020 Yearly Change: yearly change in population as a percentage Net Change: the net change in population Density(P/km2): population density Land Area(km2): land area Migrants(net): net number of migrants Fert. Rate: Fertility Rate Med. Age: median age Urban pop: urban population World Share: share of the world population as a percentage
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Associated with manuscript titled: Fifty Muslim-majority countries have fewer COVID-19 cases and deaths than the 50 richest non-Muslim countriesThe objective of this research was to determine the difference in the total number of COVID-19 cases and deaths between Muslim-majority and non-Muslim countries, and investigate reasons for the disparities. Methods: The 50 Muslim-majority countries had more than 50.0% Muslims with an average of 87.5%. The non-Muslim country sample consisted of 50 countries with the highest GDP while omitting any Muslim-majority countries listed. The non-Muslim countries’ average percentage of Muslims was 4.7%. Data pulled on September 18, 2020 included the percentage of Muslim population per country by World Population Review15 and GDP per country, population count, and total number of COVID-19 cases and deaths by Worldometers.16 The data set was transferred via an Excel spreadsheet on September 23, 2020 and analyzed. To measure COVID-19’s incidence in the countries, three different Average Treatment Methods (ATE) were used to validate the results. Results published as a preprint at https://doi.org/10.31235/osf.io/84zq5(15) Muslim Majority Countries 2020 [Internet]. Walnut (CA): World Population Review. 2020- [Cited 2020 Sept 28]. Available from: http://worldpopulationreview.com/country-rankings/muslim-majority-countries (16) Worldometers.info. Worldometer. Dover (DE): Worldometer; 2020 [cited 2020 Sept 28]. Available from: http://worldometers.info
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Demographic data set of countries of the world (1955-2020). This dataset is created using Web Scraping technics on webpage: https://www.worldometers.info/population/.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F128750%2F66baee67b3e35bf9656ff816e692527e%2Fsnapshot_worldometer_july4.png?generation=1593988535797227&alt=media" alt="">
The dataset contains data about the numbers of tests, cases, deaths, serious/critical cases, active cases and recovered cases in each country for every day since April 18, and also contains the population of each country to calculate per-capita penetration of the virus
I've removed data from the "Diamond Princess" and "MS Zaandam" since they are not countries
Additionally, an auxiliray table with information about the fraction of the general population at different age groups for every country is added (taken from Wikipedia). This is specifically relevant since COVID-19 death rate is very much age dependent.
The people at "www.worldometers.info" collecting and maintaining this site really are doing very important work "https://www.worldometers.info/coronavirus/#countries">https://www.worldometers.info/coronavirus/#countries
Data about age structure for every country comes from wikipedia
It's possible to use this dataset for various purposes and analyses My goal will be to use the additional data about the number of tests performed in each country to estimate the true death and infection rates of COVID-19
UPDATED till 10/04/2020 23:59:59
Worldometer Covid-19 Data is available as csv file. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.
(2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC
Country - List of countries affected by covid-19 Total Cases - Cumulative number of confirmed cases till date New Cases - New confirmed cases each day Total Deaths - Cumulative number of deaths till date New Deaths - New death cases each day Total Recovered - Cumulative number of recovered cases till date Active Cases - Cumulative number of recovered cases till date Serious, Critical - Cumulative number of Serious/Critical cases till date Tot Cases/1M pop - Cumulative number of confirmed cases till date per million population Deaths/1M pop - Cumulative number of deaths till date per million population Total Tests - Cumulative number of test till date Tests/1M pop - Cumulative number of test till date per million population
As the world is fighting against this invisible enemy a lot of data-driven students like me want to study it as well as we can. There is an enormous number of data set available on covid19 today but as a beginner, in this field, I wanted to find some more simple data. So here I come up with this covid19 data set which I scrapped from "https://www.worldometers.info/coronavirus". It is my way of learning by doing. This data is till 5/17/2020. I will keep on updating it.
The dataset contains 194 rows and 12 columns which are described below:-
Country: Contains the name of all Countries. Total_Cases: It contains the total number of cases the country has till 5/17/2020. Total_Deaths: Total number of deaths in that country till 5/17/2020. Total_Recovered: Total number of individuals recovered from covid19. Active_Cases: Total active cases in the country on 5/17/2020. Critical_Cases: Number of patients in critical condition. Cases/Million_Population: Number of cases per million population of that country. Deaths/Million_Population: Number of deaths per million population of that country. Total_Tests: Total number of tests performed 5/17/2020 Tests/Million_Population: Number of tests performed per million population. Population: Population of the country Continent: Continent in which the country lies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Covid19 in World Countries-Latest Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/anandhuh/covid19-in-world-countrieslatest-data on 12 November 2021.
--- Dataset description provided by original source is as follows ---
This dataset contains Covid-19 data of world countries as on November 10, 2021
Link : https://www.worldometers.info/coronavirus/#countries
Link : https://www.kaggle.com/anandhuh/datasets
Upvote if you find it useful 🙏
--- Original source retains full ownership of the source dataset ---
Based on a comparison of coronavirus deaths in 210 countries relative to their population, Peru had the most losses to COVID-19 up until July 13, 2022. As of the same date, the virus had infected over 557.8 million people worldwide, and the number of deaths had totaled more than 6.3 million. Note, however, that COVID-19 test rates can vary per country. Additionally, big differences show up between countries when combining the number of deaths against confirmed COVID-19 cases. The source seemingly does not differentiate between "the Wuhan strain" (2019-nCOV) of COVID-19, "the Kent mutation" (B.1.1.7) that appeared in the UK in late 2020, the 2021 Delta variant (B.1.617.2) from India or the Omicron variant (B.1.1.529) from South Africa.
The difficulties of death figures
This table aims to provide a complete picture on the topic, but it very much relies on data that has become more difficult to compare. As the coronavirus pandemic developed across the world, countries already used different methods to count fatalities, and they sometimes changed them during the course of the pandemic. On April 16, for example, the Chinese city of Wuhan added a 50 percent increase in their death figures to account for community deaths. These deaths occurred outside of hospitals and went unaccounted for so far. The state of New York did something similar two days before, revising their figures with 3,700 new deaths as they started to include “assumed” coronavirus victims. The United Kingdom started counting deaths in care homes and private households on April 29, adjusting their number with about 5,000 new deaths (which were corrected lowered again by the same amount on August 18). This makes an already difficult comparison even more difficult. Belgium, for example, counts suspected coronavirus deaths in their figures, whereas other countries have not done that (yet). This means two things. First, it could have a big impact on both current as well as future figures. On April 16 already, UK health experts stated that if their numbers were corrected for community deaths like in Wuhan, the UK number would change from 205 to “above 300”. This is exactly what happened two weeks later. Second, it is difficult to pinpoint exactly which countries already have “revised” numbers (like Belgium, Wuhan or New York) and which ones do not. One work-around could be to look at (freely accessible) timelines that track the reported daily increase of deaths in certain countries. Several of these are available on our platform, such as for Belgium, Italy and Sweden. A sudden large increase might be an indicator that the domestic sources changed their methodology.
Where are these numbers coming from?
The numbers shown here were collected by Johns Hopkins University, a source that manually checks the data with domestic health authorities. For the majority of countries, this is from national authorities. In some cases, like China, the United States, Canada or Australia, city reports or other various state authorities were consulted. In this statistic, these separately reported numbers were put together. For more information or other freely accessible content, please visit our dedicated Facts and Figures page.
JHU Coronavirus COVID-19 Global Cases, by country
PHS is updating the Coronavirus Global Cases dataset weekly, Monday, Wednesday and Friday from Cloud Marketplace.
This data comes from the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). This database was created in response to the Coronavirus public health emergency to track reported cases in real-time. The data include the location and number of confirmed COVID-19 cases, deaths, and recoveries for all affected countries, aggregated at the appropriate province or state. It was developed to enable researchers, public health authorities and the general public to track the outbreak as it unfolds. Additional information is available in the blog post.
Visual Dashboard (desktop): https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
Included Data Sources are:
%3C!-- --%3E
**Terms of Use: **
This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.
**U.S. county-level characteristics relevant to COVID-19 **
Chin, Kahn, Krieger, Buckee, Balsari and Kiang (forthcoming) show that counties differ significantly in biological, demographic and socioeconomic factors that are associated with COVID-19 vulnerability. A range of publicly available county-specific data identifying these key factors, guided by international experiences and consideration of epidemiological parameters of importance, have been combined by the authors and are available for use:
As of May 2, 2023, the outbreak of the coronavirus disease (COVID-19) had been confirmed in almost every country in the world. The virus had infected over 687 million people worldwide, and the number of deaths had reached almost 6.87 million. The most severely affected countries include the U.S., India, and Brazil.
COVID-19: background information COVID-19 is a novel coronavirus that had not previously been identified in humans. The first case was detected in the Hubei province of China at the end of December 2019. The virus is highly transmissible and coughing and sneezing are the most common forms of transmission, which is similar to the outbreak of the SARS coronavirus that began in 2002 and was thought to have spread via cough and sneeze droplets expelled into the air by infected persons.
Naming the coronavirus disease Coronaviruses are a group of viruses that can be transmitted between animals and people, causing illnesses that may range from the common cold to more severe respiratory syndromes. In February 2020, the International Committee on Taxonomy of Viruses and the World Health Organization announced official names for both the virus and the disease it causes: SARS-CoV-2 and COVID-19, respectively. The name of the disease is derived from the words corona, virus, and disease, while the number 19 represents the year that it emerged.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Learning Web Scraping in order to build my own datasets, and this is the first one in the learning process. Let's try and build great datasets in the future for better analysis and predictions.
Scraped the data on March 10, 2020, from https://www.worldometers.info/world-population/population-by-country/ Dataset represents the population count country-wise for a specific time period.
Firstly, Thanks to the Content creator on the website https://www.worldometers.info, who provides reliable data on the internet. Secondly, To the Tutor who taught me how to scrape websites.
Is this dataset valuable? Where can we utilize this dataset in data science?