9 datasets found

A
‘Population by Country - 2020’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Population by Country - 2020’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-population-by-country-2020-c8b7/latest
Explore at:
Dataset updated
Feb 13, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Population by Country - 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tanuprabhu/population-by-country-2020 on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

I always wanted to access a data set that was related to the world’s population (Country wise). But I could not find a properly documented data set. Rather, I just created one manually.

Content

Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Population of countries) on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. And also there were only I think 190 or more countries. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with their population.

Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.

https://fiverr-res.cloudinary.com/images/t_main1,q_auto,f_auto/gigs/119580480/original/68088c5f588ec32a6b3a3a67ec0d1b5a8a70648d/do-web-scraping-and-data-mining-with-python.png" alt="alt text">

You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.

Below is the code that I used to scrape the code from the website

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2Fe814c2739b99d221de328c72a0b2571e%2FCapture.PNG?generation=1581314967227445&alt=media" alt="">

Acknowledgements

Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data.

Inspiration

As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting

--- Original source retains full ownership of the source dataset ---
f
Distribution of first name and last name frequencies by country
figshare.com
xlsx
Updated Feb 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Thelwall (2023). Distribution of first name and last name frequencies by country [Dataset]. http://doi.org/10.6084/m9.figshare.21956795.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21956795.v2
Dataset updated
Feb 2, 2023
Dataset provided by
figshare
Authors
Mike Thelwall
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Distribution of first and last name frequencies of academic authors by country.

Spreadsheet 1 contains 50 countries, with names based on affiliations in Scopus journal articles 2001-2021.

Spreadsheet 2 contains 200 countries, with names based on affiliations in Scopus journal articles 2001-2021, using a marginally updated last name extraction algorithm that is almost the same except for Dutch/Flemish names.

From the paper: Can national researcher mobility be tracked by first or last name uniqueness?

For example the distribution for the UK shows a single peak for international names, with no national names, Belgium has a national peak and an international peak, and China has mainly a national peak. The 50 countries are:

No Code Country 1 SB Serbia 2 IE Ireland 3 HU Hungary 4 CL Chile 5 CO Columbia 6 NG Nigeria 7 HK Hong Kong 8 AR Argentina 9 SG Singapore 10 NZ New Zealand 11 PK Pakistan 12 TH Thailand 13 UA Ukraine 14 SA Saudi Arabia 15 RO Israel 16 ID Indonesia 17 IL Israel 18 MY Malaysia 19 DK Denmark 20 CZ Czech Republic 21 ZA South Africa 22 AT Austria 23 FI Finland 24 PT Portugal 25 GR Greece 26 NO Norway 27 EG Egypt 28 MX Mexico 29 BE Belgium 30 CH Switzerland 31 SW Sweden 32 PL Poland 33 TW Taiwan 34 NL Netherlands 35 TK Turkey 36 IR Iran 37 RU Russia 38 AU Australia 39 BR Brazil 40 KR South Korea 41 ES Spain 42 CA Canada 43 IT France 44 FR France 45 IN India 46 DE Germany 47 US USA 48 UK UK 49 JP Japan 50 CN China

COVID19 Additional Data

kaggle.com

Updated Apr 9, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Orzhiang (2020). COVID19 Additional Data [Dataset]. https://www.kaggle.com/datasets/orzhiang/covid19-additional-data/versions/11

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 9, 2020

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Orzhiang

Description

This is a collection of dataset that I personally think it is useful in analysing COVID19 data. Since all of the data comes from the internet and majority of them originated from World Bank, I am use some Kaggle users has already uploaded similar data. However, I think it makes my life (and perhaps yours) easier by compiling all of these data together.

The following are some remarks for the dataset-

Dataset Title	Descriptions
Other source of COVID19 Cases	https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset#time_series_covid_19_confirmed.csv
Mortality Table	https://www.kaggle.com/robikscube/world-health-organization-who-mortality-database
Economic Freedom Index	https://www.kaggle.com/lewisduncan93/the-economic-freedom-index
World Bank Development Indicators	https://www.kaggle.com/theworldbank/world-development-indicators
Weather Data	https://www.kaggle.com/hbfree/covid19formattedweatherjan22march24
Government Response	https://www.bsg.ox.ac.uk/research/research-projects/oxford-covid-19-government-response-tracker
Containment and Mitigation Measures	https://www.kaggle.com/paultimothymooney/covid-19-containment-and-mitigation-measures/
World Happiness Report	https://www.kaggle.com/londeen/world-happiness-report-2020
Weather Data 2	https://www.kaggle.com/noaa/gsod
US Data Prior to 2020-03-09	https://www.kaggle.com/johnjdavisiv/jhu-covid19-data-with-us-state-data-prior-to-mar-9
OCED Hospital Bed per 1000 inhabitants	https://www.kaggle.com/cpmpml/oecd-hospital-beds-per-1000-inhabitant
Covid 19 data by the US States	https://www.kaggle.com/scirpus/covid-by-state
COVID 19 Demographic predictors	https://www.kaggle.com/nightranger77/covid19-demographic-predictors
Country Info	https://www.kaggle.com/koryto/countryinfo
Population by location	https://www.kaggle.com/dgrechka/covid19-global-forecasting-locations-population
00 COVID19 Country Mapping Table	A mapping table serve as a link between world bank country name & country code with the country name used in COVID19 Competition. It makes linking the COVID19 data and World Bank data much easier.
01 Population_API_SP.POP.TOTL	https://data.worldbank.org/indicator/sp.pop.totl
01_1 China Demographic Data	Source: http://www.chamiji.com/2019chinaprovincepopulation http://www.stats.gov.cn/tjsj/ndsj/2017/indexeh.htm http://data.stats.gov.cn/english/easyquery.htm?cn=C01 http://www.gov.cn/test/2007-08/07/content_708271.htm

Book Publishing Dataset from 1600s to 2016
kaggle.com
Updated Apr 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Orvile (2025). Book Publishing Dataset from 1600s to 2016 [Dataset]. https://www.kaggle.com/datasets/orvile/book-publishing-dataset-from-1600s-to-2016/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Orvile
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is dataset featuring various attributes about books from nine different publishers, with publishing years ranging from 1600s to 2016. Included in the data is attributes reagrding sales, ratings and book identities. The data was orginally published by Josh Murrey on data.world under the name Books;

More information can be found in my CS 573 Project Proposal.

visit: https://gist.github.com/apietrick24#book-publishing-dataset
Most popular database management systems worldwide 2024
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most popular database management systems worldwide 2024 [Dataset]. https://www.statista.com/statistics/809750/worldwide-popularity-ranking-database-management-systems/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 2024
Area covered
Worldwide
Description
As of June 2024, the most popular database management system (DBMS) worldwide was Oracle, with a ranking score of *******; MySQL and Microsoft SQL server rounded out the top three. Although the database management industry contains some of the largest companies in the tech industry, such as Microsoft, Oracle and IBM, a number of free and open-source DBMSs such as PostgreSQL and MariaDB remain competitive. Database Management Systems As the name implies, DBMSs provide a platform through which developers can organize, update, and control large databases. Given the business world’s growing focus on big data and data analytics, knowledge of SQL programming languages has become an important asset for software developers around the world, and database management skills are seen as highly desirable. In addition to providing developers with the tools needed to operate databases, DBMS are also integral to the way that consumers access information through applications, which further illustrates the importance of the software.
Customer Shopping Trends Dataset
kaggle.com
Updated Oct 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 5, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sourav Banerjee
Description
Context

The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

Content

This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

Dataset Glossary (Column-wise)

Customer ID - Unique identifier for each customer

Age - Age of the customer

Gender - Gender of the customer (Male/Female)

Item Purchased - The item purchased by the customer

Category - Category of the item purchased

Purchase Amount (USD) - The amount of the purchase in USD

Location - Location where the purchase was made

Size - Size of the purchased item

Color - Color of the purchased item

Season - Season during which the purchase was made

Review Rating - Rating given by the customer for the purchased item

Subscription Status - Indicates if the customer has a subscription (Yes/No)

Shipping Type - Type of shipping chosen by the customer

Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)

Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)

Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction

Payment Method - Customer's most preferred payment method

Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

Structure of the Dataset

https://i.imgur.com/6UEqejq.png" alt="">

Acknowledgement

This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

Cover Photo by: Freepik

Thumbnail by: Clothing icons created by Flat Icons - Flaticon
o
US Colleges and Universities
public.opendatasoft.com
data.smartidf.services
csv, excel, geojson +1
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). US Colleges and Universities [Dataset]. https://public.opendatasoft.com/explore/dataset/us-colleges-and-universities/
Explore at:
json, excel, geojson, csvAvailable download formats
Dataset updated
Jun 6, 2025
License
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
Area covered
United States
Description
The Colleges and Universities feature class/shapefile is composed of all Post Secondary Education facilities as defined by the Integrated Post Secondary Education System (IPEDS, http://nces.ed.gov/ipeds/), National Center for Education Statistics (NCES, https://nces.ed.gov/), US Department of Education for the 2018-2019 school year. Included are Doctoral/Research Universities, Masters Colleges and Universities, Baccalaureate Colleges, Associates Colleges, Theological seminaries, Medical Schools and other health care professions, Schools of engineering and technology, business and management, art, music, design, Law schools, Teachers colleges, Tribal colleges, and other specialized institutions. Overall, this data layer covers all 50 states, as well as Puerto Rico and other assorted U.S. territories. This feature class contains all MEDS/MEDS+ as approved by the National Geospatial-Intelligence Agency (NGA) Homeland Security Infrastructure Program (HSIP) Team. Complete field and attribute information is available in the ”Entities and Attributes” metadata section. Geographical coverage is depicted in the thumbnail above and detailed in the "Place Keyword" section of the metadata. This feature class does not have a relationship class but is related to Supplemental Colleges. Colleges and Universities that are not included in the NCES IPEDS data are added to the Supplemental Colleges feature class when found. This release includes the addition of 175 new records, the removal of 468 no longer reported by NCES, and modifications to the spatial location and/or attribution of 6682 records.
d
List of all countries with their 2 digit codes (ISO 3166-1)
datahub.io
Updated Aug 29, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). List of all countries with their 2 digit codes (ISO 3166-1) [Dataset]. https://datahub.io/core/country-list
Explore at:
Dataset updated
Aug 29, 2017
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
ISO 3166-1-alpha-2 English country names and code elements. This list states the country names (official short names in English) in alphabetical order as given in ISO 3166-1 and the corresponding ISO 3166-1-alpha-2 code elements.
List of UK Health Workers Dead from COVID-19
kaggle.com
Updated Apr 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
V. Gates (2020). List of UK Health Workers Dead from COVID-19 [Dataset]. https://www.kaggle.com/vgates/list-of-uk-health-workers-dead-from-covid19/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 21, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
V. Gates
Area covered
United Kingdom
Description
A List of UK Health Workers Who Have Died from COVID-19

Made machine-readable by hand from data from the UK newspaper "The Guardian", in this article: "Doctors, nurses, porters, volunteers: the UK health workers who have died from Covid-19" https://www.theguardian.com/world/2020/apr/16/doctors-nurses-porters-volunteers-the-uk-health-workers-who-have-died-from-covid-19

The Guardian is continuing to update the list day-by-day, as the COVID-19 pandemic continues. I do not plan to update this dataset, assuming, since the data collection biases are unknown, that nobody else will find it very interesting. I am not a copyright lawyer and do not know if this data is protected copyright, and if so, in which parts of the world.

Caveat: Creating this dataset from a newspaper article required a lot of hand work. I've done my best, but there may be mistakes.

Columns: Name age institution city: I have filled this in myself; I am ignorant of UK geography and there may well be mistakes date_of_death possible_ppe_issue: mostly blank, but I have filled in "yes" where the article mentions a person who had doubts about the adequacy of PPE (personal protective equipment) MED_SPEC: I have attempted to fill in a medical specialty from the values used on the Eurostat web site for Physicians by Medical Specialty" and "Nursing and caring professionals" tables. The idea is to be able to calculate a fraction of affected individuals by specialty.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Population by Country - 2020’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-population-by-country-2020-c8b7/latest

‘Population by Country - 2020’ analyzed by Analyst-2

Explore at:

Dataset updated

Feb 13, 2020

Dataset authored and provided by

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘Population by Country - 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/tanuprabhu/population-by-country-2020 on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

I always wanted to access a data set that was related to the world’s population (Country wise). But I could not find a properly documented data set. Rather, I just created one manually.

Content

Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Population of countries) on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. And also there were only I think 190 or more countries. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with their population.

Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.

https://fiverr-res.cloudinary.com/images/t_main1,q_auto,f_auto/gigs/119580480/original/68088c5f588ec32a6b3a3a67ec0d1b5a8a70648d/do-web-scraping-and-data-mining-with-python.png" alt="alt text">

You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.

Below is the code that I used to scrape the code from the website

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2Fe814c2739b99d221de328c72a0b2571e%2FCapture.PNG?generation=1581314967227445&alt=media" alt="">

Acknowledgements

Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data.

Inspiration

As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting

--- Original source retains full ownership of the source dataset ---

Clear search

Close search

Google apps

Main menu

‘Population by Country - 2020’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Distribution of first name and last name frequencies by country

COVID19 Additional Data

Book Publishing Dataset from 1600s to 2016

Most popular database management systems worldwide 2024

Customer Shopping Trends Dataset

Context

Content

Dataset Glossary (Column-wise)

Structure of the Dataset

Acknowledgement

US Colleges and Universities

List of all countries with their 2 digit codes (ISO 3166-1)

List of UK Health Workers Dead from COVID-19

‘Population by Country - 2020’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration