23 datasets found

house prices data exploration
kaggle.com
Updated Sep 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yvonne gatwiri (2024). house prices data exploration [Dataset]. https://www.kaggle.com/yvonnegatwiri/house-prices-data-exploration/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 13, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
yvonne gatwiri
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by yvonne gatwiri

Released under Apache 2.0

Contents
house prices data exploration
kaggle.com
Updated Sep 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yvonne gatwiri (2024). house prices data exploration [Dataset]. https://www.kaggle.com/datasets/yvonnegatwiri/house-prices-data-exploration/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 13, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
yvonne gatwiri
Description
Dataset

This dataset was created by yvonne gatwiri

Released under Apache 2.0

Contents
Credit EDA Case Study
kaggle.com
Updated Aug 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nandish Jani (2021). Credit EDA Case Study [Dataset]. https://www.kaggle.com/datasets/nandishjani/credit-eda-case-study
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 8, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nandish Jani
Description
Dataset

This dataset was created by Nandish Jani

Contents
Employee Turnover Analytics Dataset
kaggle.com
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akshay Hedau (2023). Employee Turnover Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/akshayhedau/employee-turnover-analytics-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Akshay Hedau
Description
Portobello Tech is an app innovator that has devised an intelligent way of predicting employee turnover within the company. It periodically evaluates employees' work details including the number of projects they worked upon, average monthly working hours, time spent in the company, promotions in the last 5 years, and salary level. Data from prior evaluations show the employee’s satisfaction at the workplace. The data could be used to identify patterns in work style and their interest to continue to work in the company. The HR Department owns the data and uses it to predict employee turnover. Employee turnover refers to the total number of workers who leave a company over a certain time period.
Spark Fund Investment Analysis
kaggle.com
zip
Updated Sep 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranay Prabhat (2019). Spark Fund Investment Analysis [Dataset]. https://www.kaggle.com/pranay969/spark-fund-investment-analysis
Explore at:
zip(6260727 bytes)Available download formats
Dataset updated
Sep 5, 2019
Authors
Pranay Prabhat
Description
Project Brief

You work for Spark Funds, an asset management company. Spark Funds wants to make investments in a few companies. The CEO of Spark Funds wants to understand the global trends in investments so that she can take the investment decisions effectively.

Business and Data Understanding

Spark Funds has two minor constraints for investments:

It wants to invest between 5 to 15 million USD per round of investment

It wants to invest only in English-speaking countries because of the ease of communication with the companies it would invest in

For your analysis, consider a country to be English speaking only if English is one of the official languages in that country

You may use this link: :- https://en.wikipedia.org/wiki/List_of_territorial_entities_where_English_is_an_official_language for a list of countries where English is an official language.

These conditions will give you sufficient information for your initial analysis. Before getting to specific questions, let’s understand the problem and the data first.
A
‘Census County Economically Distressed Areas 2018’ analyzed by Analyst-2
analyst-2.ai
Updated Mar 11, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2011). ‘Census County Economically Distressed Areas 2018’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-census-county-economically-distressed-areas-2018-760f/5de0d3ed/?iid=010-679&v=presentation
Explore at:
Dataset updated
Mar 11, 2011
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Census County Economically Distressed Areas 2018’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/0b289b5e-0507-424d-9f07-f8d2b11b9580 on 27 January 2022.

--- Dataset description provided by original source is as follows ---

This is a copy of the statewide Census County GIS Tiger file. It is used to determine if a county is EDA or not by adding ACS (American Community Survey) Median Household Income (MHI) and Population Density data at the county level. The IRWM web based DAC mapping tool uses this GIS layer. Every year this table gets updated after ACS publishes their updated estimates. Created by joining 2016 EDA table to 2010 block groups feature class. The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Block Groups (BGs) are defined before tabulation block delineation and numbering, but are clusters of blocks within the same census tract that have the same first digit of their 4-digit census block number from the same decennial census. For example, Census 2000 tabulation blocks 3001, 3002, 3003,.., 3999 within Census 2000 tract 1210.02 are also within BG 3 within that census tract. Census 2000 BGs generally contained between 600 and 3,000 people, with an optimum size of 1,500 people. Most BGs were delineated by local participants in the Census Bureau's Participant Statistical Areas Program (PSAP). The Census Bureau delineated BGs only where the PSAP participant declined to delineate BGs or where the Census Bureau could not identify any local PSAP participant. A BG usually covers a contiguous area. Each census tract contains at least one BG, and BGs are uniquely numbered within census tract. Within the standard census geographic hierarchy, BGs never cross county or census tract boundaries, but may cross the boundaries of other geographic entities like county subdivisions, places, urban areas, voting districts, congressional districts, and American Indian / Alaska Native / Native Hawaiian areas. BGs have a valid code range of 0 through 9. BGs coded 0 were intended to only include water area, no land area, and they are generally in territorial seas, coastal water, and Great Lakes water areas. For Census 2000, rather than extending a census tract boundary into the Great Lakes or out to the U.S. nautical three-mile limit, the Census Bureau delineated some census tract boundaries along the shoreline or just offshore. The Census Bureau assigned a default census tract number of 0 and BG of 0 to these offshore, water-only areas not included in regularly numbered census tract areas.

--- Original source retains full ownership of the source dataset ---
HR-attrition-EDA
kaggle.com
Updated Aug 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sagar Shee (2020). HR-attrition-EDA [Dataset]. https://www.kaggle.com/winterbreeze/hrattritioneda/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 10, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sagar Shee
Description
Context

This dataset is cleaned and ready to deploy for model building.

Content

This dataset is for learning purpose and thus is simplified and is without any null values or major skewness.

Inspiration

I learned much from Kaggle and the data community and this is my contribution so that flow of knowledge never stops.
f
Univariate and multivariate cox regression models testing associations...
plos.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Barbara Scotti; Giulio Disanto; Rosaria Sacco; Marilu’ Guigli; Chiara Zecca; Claudio Gobbi (2023). Univariate and multivariate cox regression models testing associations between baseline characteristics and risk of EDA during RTX treatment. [Dataset]. http://doi.org/10.1371/journal.pone.0197415.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0197415.t002
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Barbara Scotti; Giulio Disanto; Rosaria Sacco; Marilu’ Guigli; Chiara Zecca; Claudio Gobbi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Univariate and multivariate cox regression models testing associations between baseline characteristics and risk of EDA during RTX treatment.
A
‘Census Block Group Economically Distressed Areas 2018’ analyzed by...
analyst-2.ai
Updated Mar 11, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2011). ‘Census Block Group Economically Distressed Areas 2018’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-census-block-group-economically-distressed-areas-2018-62f0/latest
Explore at:
Dataset updated
Mar 11, 2011
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Census Block Group Economically Distressed Areas 2018’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/ac57065c-1179-421b-968f-e8010700189c on 12 February 2022.

--- Dataset description provided by original source is as follows ---

This is a copy of the statewide Census Block Group GIS Tiger file. It is used to determine if a block group (BG) is EDA or not by adding ACS (American Community Survey) Median Household Income (MHI) and Population Density data at the BG level. The IRWM web based DAC mapping tool uses this GIS layer. Every year this table gets updated after ACS publishes their updated estimates. Created by joining 2016 EDA table to 2010 block groups feature class. The TIGER/Line Files are shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line File is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Block Groups (BGs) are defined before tabulation block delineation and numbering, but are clusters of blocks within the same census tract that have the same first digit of their 4-digit census block number from the same decennial census. For example, Census 2000 tabulation blocks 3001, 3002, 3003,.., 3999 within Census 2000 tract 1210.02 are also within BG 3 within that census tract. Census 2000 BGs generally contained between 600 and 3,000 people, with an optimum size of 1,500 people. Most BGs were delineated by local participants in the Census Bureau's Participant Statistical Areas Program (PSAP). The Census Bureau delineated BGs only where the PSAP participant declined to delineate BGs or where the Census Bureau could not identify any local PSAP participant. A BG usually covers a contiguous area. Each census tract contains at least one BG, and BGs are uniquely numbered within census tract. Within the standard census geographic hierarchy, BGs never cross county or census tract boundaries, but may cross the boundaries of other geographic entities like county subdivisions, places, urban areas, voting districts, congressional districts, and American Indian / Alaska Native / Native Hawaiian areas. BGs have a valid code range of 0 through 9. BGs coded 0 were intended to only include water area, no land area, and they are generally in territorial seas, coastal water, and Great Lakes water areas. For Census 2000, rather than extending a census tract boundary into the Great Lakes or out to the U.S. nautical three-mile limit, the Census Bureau delineated some census tract boundaries along the shoreline or just offshore. The Census Bureau assigned a default census tract number of 0 and BG of 0 to these offshore, water-only areas not included in regularly numbered census tract areas.

--- Original source retains full ownership of the source dataset ---
A
‘COVID-19 dataset in Japan’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘COVID-19 dataset in Japan’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-19-dataset-in-japan-2665/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Japan
Description
Analysis of ‘COVID-19 dataset in Japan’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/lisphilar/covid19-dataset-in-japan on 28 January 2022.

--- Dataset description provided by original source is as follows ---

1. Context

This is a COVID-19 dataset in Japan. This does not include the cases in Diamond Princess cruise ship (Yokohama city, Kanagawa prefecture) and Costa Atlantica cruise ship (Nagasaki city, Nagasaki prefecture). - Total number of cases in Japan - The number of vaccinated people (New/experimental) - The number of cases at prefecture level - Metadata of each prefecture

Note: Lisphilar (author) uploads the same files to https://github.com/lisphilar/covid19-sir/tree/master/data

This dataset can be retrieved with CovsirPhy (Python library).

pip install covsirphy --upgrade

import covsirphy as cs data_loader = cs.DataLoader() japan_data = data_loader.japan() # The number of cases (Total/each province) clean_df = japan_data.cleaned() # Metadata meta_df = japan_data.meta()

Please refer to CovsirPhy Documentation: Japan-specific dataset.

Note: Before analysing the data, please refer to Kaggle notebook: EDA of Japan dataset and COVID-19: Government/JHU data in Japan. The detailed explanation of the build process is discussed in Steps to build the dataset in Japan. If you find errors or have any questions, feel free to create a discussion topic.

1.1 Total number of cases in Japan

covid_jpn_total.csv Cumulative number of cases: - PCR-tested / PCR-tested and positive - with symptoms (to 08May2020) / without symptoms (to 08May2020) / unknown (to 08May2020) - discharged - fatal

The number of cases: - requiring hospitalization (from 09May2020) - hospitalized with mild symptoms (to 08May2020) / severe symptoms / unknown (to 08May2020) - requiring hospitalization, but waiting in hotels or at home (to 08May2020)

In primary source, some variables were removed on 09May2020. Values are NA in this dataset from 09May2020.

Manually collected the data from Ministry of Health, Labour and Welfare HP:
厚生労働省 HP (in Japanese)
Ministry of Health, Labour and Welfare HP (in English)

The number of vaccinated people: - Vaccinated_1st: the number of vaccinated persons for the first time on the date - Vaccinated_2nd: the number of vaccinated persons with the second dose on the date - Vaccinated_3rd: the number of vaccinated persons with the third dose on the date

Data sources for vaccination: - To 09Apr2021: 厚生労働省 HP 新型コロナワクチンの接種実績(in Japanese) - 首相官邸新型コロナワクチンについて - From 10APr2021: Twitter: 首相官邸（新型コロナワクチン情報）

1.2 The number of cases at prefecture level

covid_jpn_prefecture.csv Cumulative number of cases: - PCR-tested / PCR-tested and positive - discharged - fatal

The number of cases: - requiring hospitalization (from 09May2020) - hospitalized with severe symptoms (from 09May2020)

Using pdf-excel converter, manually collected the data from Ministry of Health, Labour and Welfare HP:
厚生労働省 HP (in Japanese)
Ministry of Health, Labour and Welfare HP (in English)

Note: covid_jpn_prefecture.groupby("Date").sum() does not match covid_jpn_total. When you analyse total data in Japan, please use covid_jpn_total data.

1.3 Metadata of each prefecture

covid_jpn_metadata.csv - Population (Total, Male, Female): 厚生労働省厚生統計要覧（2017年度）第１－５表 - Area (Total, Habitable): Wikipedia 都道府県の面積一覧 (2015)

Hospital_bed: With the primary data of 厚生労働省感染症指定医療機関の指定状況（平成31年4月1日現在）, 厚生労働省第二種感染症指定医療機関の指定状況（平成31年4月1日現在）, 厚生労働省医療施設動態調査（令和２年１月末概数）, 厚生労働省感染症指定医療機関について and secondary data of COVID-19 Japan 都道府県別感染症病床数,

Specific: Hospital beds of medical institutions designated for specific infectious diseases

Type-I: Hospital beds of medical institutions designated for type I infectious diseases

Type-II: Hospital beds of medical institutions designated for type II infectious diseases

Tuberculosis: Hospital beds of medical institutions designated for tuberculosis (outpatient care)

Care: long term care bed of hospitals

Total: Beds of all hospitals

Clinic_bed: With the primary data of 医療施設動態調査（令和２年１月末概数） ,

Care: long term care beds of clinics

Total: Beds of all clinics

Location: Data is from LinkData 都道府県庁所在地 (Public Domain) (secondary data).

Latitude

Longitude

Admin

Capital: Prefectural capital city. Data is from LinkData 都道府県庁所在地 (Public Domain) (secondary data).

Region: Region name. Data is from WIkipedia (secondary data). "Kyushu-Okinawa region" was separated to "Kyushu" and "Okinawa" by this datasets' author.

Num: Prefecture code (JIS X 0401: Hokkaido=1,...Okinawa=47). Data is from 国土交通省 GIS HP Pref code. cf. (not source) Japan VIsitor: Japan Prefectures Map.

2. Acknowledgements

To create this dataset, edited and transformed data of the following sites was used.

厚生労働省 Ministry of Health, Labour and Welfare, Japan:
厚生労働省 HP (in Japanese)
Ministry of Health, Labour and Welfare HP (in English) 厚生労働省 HP 利用規約・リンク・著作権等 CC BY 4.0 (in Japanese)

国土交通省 Ministry of Land, Infrastructure, Transport and Tourism, Japan: 国土交通省 HP (in Japanese) 国土交通省 HP (in English) 国土交通省 HP 利用規約・リンク・著作権等 CC BY 4.0 (in Japanese)

Code for Japan / COVID-19 Japan: Code for Japan COVID-19 Japan Dashboard (CC BY 4.0) COVID-19 Japan 都道府県別感染症病床数 (CC BY)

Wikipedia: Wikipedia

LinkData: LinkData (Public Domain)

Inspiration

Changes in number of cases over time

Percentage of patients without symptoms / mild or severe symptoms

What to do next to prevent outbreak

License and how to cite

Kindly cite this dataset under CC BY-4.0 license as follows. - Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, GitHub repository, https://github.com/lisphilar/covid19-sir/data/japan, or - Hirokazu Takaya (2020-2022), COVID-19 dataset in Japan, Kaggle Dataset, https://www.kaggle.com/lisphilar/covid19-dataset-in-japan

--- Original source retains full ownership of the source dataset ---
Baseline characteristics at first RTX infusion in all MS, RRMS and PMS...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Barbara Scotti; Giulio Disanto; Rosaria Sacco; Marilu’ Guigli; Chiara Zecca; Claudio Gobbi (2023). Baseline characteristics at first RTX infusion in all MS, RRMS and PMS patients. [Dataset]. http://doi.org/10.1371/journal.pone.0197415.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0197415.t001
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Barbara Scotti; Giulio Disanto; Rosaria Sacco; Marilu’ Guigli; Chiara Zecca; Claudio Gobbi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Baseline characteristics at first RTX infusion in all MS, RRMS and PMS patients.
Detailed Analysis on campus recruitment
kaggle.com
Updated Oct 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BANDI SAMUEL 2039426 (2020). Detailed Analysis on campus recruitment [Dataset]. https://www.kaggle.com/bandisamuel2039426/detailed-analysis-on-campus-recruitment/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 25, 2020
Dataset provided by
Kaggle
Authors
BANDI SAMUEL 2039426
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This data set consists of Placement data, of students in a XYZ campus. It includes secondary and higher secondary school percentage and specialisation. It also includes degree specialisation, type and Work experience and salary offers to the placed students we will Analyse what factors are playing a major role in order to select a candidate for job recruitment
Superstore Orders Analysis-Files
kaggle.com
Updated Feb 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pradyumna Reddy (2021). Superstore Orders Analysis-Files [Dataset]. https://www.kaggle.com/dommatap/analysisfiles/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 19, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Pradyumna Reddy
Description
Dataset

This dataset was created by Pradyumna Reddy

Contents
Toy Dataset
kaggle.com
Updated Dec 10, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlo Lepelaars (2018). Toy Dataset [Dataset]. https://www.kaggle.com/datasets/carlolepelaars/toy-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 10, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Carlo Lepelaars
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

A fictional dataset for exploratory data analysis (EDA) and to test simple prediction models.

This toy dataset features 150000 rows and 6 columns.

Columns

Note: All data is fictional. The data has been generated so that their distributions are convenient for statistical analysis.

Number: A simple index number for each row

City: The location of a person (Dallas, New York City, Los Angeles, Mountain View, Boston, Washington D.C., San Diego and Austin)

Gender: Gender of a person (Male or Female)

Age: The age of a person (Ranging from 25 to 65 years)

Income: Annual income of a person (Ranging from -674 to 177175)

Illness: Is the person Ill? (Yes or No)

Acknowledgements

Stock photo by Mika Baumeister on Unsplash.
COVID 19 Dataset
kaggle.com
Updated Sep 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahul Gupta (2020). COVID 19 Dataset [Dataset]. https://www.kaggle.com/rahulgupta21/datahub-covid19/kernels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 23, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rahul Gupta
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Coronavirus disease 2019 (COVID-19) time series listing confirmed cases, reported deaths and reported recoveries. Data is disaggregated by country (and sometimes subregion). Coronavirus disease (COVID-19) is caused by the Severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) and has had a worldwide effect. On March 11 2020, the World Health Organization (WHO) declared it a pandemic, pointing to the over 118,000 cases of the Coronavirus illness in over 110 countries and territories around the world at the time.

This dataset includes time series data tracking the number of people affected by COVID-19 worldwide, including:

confirmed tested cases of Coronavirus infection the number of people who have reportedly died while sick with Coronavirus the number of people who have reportedly recovered from it

Content

Data is in CSV format and updated daily. It is sourced from this upstream repository maintained by the amazing team at Johns Hopkins University Center for Systems Science and Engineering (CSSE) who have been doing a great public service from an early point by collating data from around the world.

We have cleaned and normalized that data, for example tidying dates and consolidating several files into normalized time series. We have also added some metadata such as column descriptions and data packaged it.
IMDB Dataset & Dictionary
kaggle.com
Updated Feb 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shivam Kapoor (2021). IMDB Dataset & Dictionary [Dataset]. https://www.kaggle.com/kapoorshivam/imdb-dataset-dictionary/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 14, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shivam Kapoor
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
We all love movies! I remember watching my first movie with my family when I was 5 and 3 years later, I still love movies. But have you ever wondered how some people rate movies as good or bad, awesome or mehh! That's correct. Different people have different perspectives on how they like or dislike movies. To help us select from a plethora of movie option out there, IMDB platform provides us honest reviews by the people for the people.

Long story short, this assignment will take you through different aspects of how a movie is reviewed by different people from across the globe based on their star cast, genre, story length and many more aspects.

So here is what you need to do! Few points: 1. Download the dataset & the dictionary that will help you learn the different columns in the dataset 2. Start exploring the data by performing EDA (wiki what’s EDA, if you are a dummy like I was initially) 3. Get back to this notebook to check what all I did for exploring through the data and then follow the subtasks & checkpoints!

Simple? Isn’t it! Do complete the exercise & let me know in the comments if you found this exercise helpful? There’s always a scope for improvement. Tell me what more could have been added to this notebook! Hope you’ll have a good time exploring data.
Covid_19 Full Explo. Data Analysis by DaudKhan
kaggle.com
Updated Aug 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad sardar daud khan (2022). Covid_19 Full Explo. Data Analysis by DaudKhan [Dataset]. https://www.kaggle.com/datasets/daudkhan2023/covid-19-full-eda/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Muhammad sardar daud khan
Description
Dataset

This dataset was created by Muhammad sardar daud khan

Contents
Titanic Dataset - EDA & Logistic Regression
kaggle.com
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RabbiTheAnalyst (2025). Titanic Dataset - EDA & Logistic Regression [Dataset]. https://www.kaggle.com/datasets/mdrabbiali/titanic-data-set/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 19, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
RabbiTheAnalyst
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Description The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone on board, resulting in the death of 1502 out of 2224 passengers and crew. While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others. In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

Objective:

Survival Prediction: To build a logistic regression model that accurately predicts the survival of passengers based on features such as age, gender, passenger class, and number of siblings/spouses aboard.

Data Cleaning and Preprocessing:To perform data cleaning by handling missing values, removing unnecessary columns, and encoding categorical variables to prepare the dataset for analysis.

Exploratory Data Analysis (EDA): To conduct a thorough exploratory data analysis to visualize survival rates and identify patterns based on various factors like gender, passenger class, and embarked location.

Feature Importance Analysis: To analyze the correlation between different features and their impact on survival rates, identifying which factors are the most significant predictors of survival.

Model Evaluation: To evaluate the performance of the logistic regression model using accuracy scores and classification reports, ensuring that the model generalizes well to unseen data.

ROC Curve Analysis: To create a ROC curve to assess the trade-off between the true positive rate and false positive rate, providing insights into the model's ability to distinguish between survivors and non-survivors.

Insights and Recommendations: To derive insights from the analysis that could inform future safety measures or policies related to passenger safety in maritime travel.

College Student Placement Factors Dataset

kaggle.com

Updated Jul 2, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Sahil Islam007 (2025). College Student Placement Factors Dataset [Dataset]. https://www.kaggle.com/datasets/sahilislam007/college-student-placement-factors-dataset

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 2, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Sahil Islam007

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

📘 College Student Placement Dataset

A realistic, large-scale synthetic dataset of 10,000 students designed to analyze factors affecting college placements.

📄 Dataset Description

This dataset simulates the academic and professional profiles of 10,000 college students, focusing on factors that influence placement outcomes. It includes features like IQ, academic performance, CGPA, internships, communication skills, and more.

The dataset is ideal for:

Predictive modeling of placement outcomes
Educational exercises in classification
Feature importance analysis
End-to-end machine learning projects

📊 Columns Description

Column Name	Description
College_ID	Unique ID of the college (e.g., CLG0001 to CLG0100)
IQ	Student’s IQ score (normally distributed around 100)
Prev_Sem_Result	GPA from the previous semester (range: 5.0 to 10.0)
CGPA	Cumulative Grade Point Average (range: ~5.0 to 10.0)
Academic_Performance	Annual academic rating (scale: 1 to 10)
Internship_Experience	Whether the student has completed any internship (Yes/No)
Extra_Curricular_Score	Involvement in extracurriculars (score from 0 to 10)
Communication_Skills	Soft skill rating (scale: 1 to 10)
Projects_Completed	Number of academic/technical projects completed (0 to 5)
Placement	Final placement result (Yes = Placed, No = Not Placed)

🎯 Target Variable

Placement: This is the binary classification target (Yes/No) that you can try to predict based on the other features.

🧠 Use Cases

📈 Classification Modeling (Logistic Regression, Decision Trees, Random Forest, etc.)
🔍 Exploratory Data Analysis (EDA)
🎯 Feature Engineering and Selection
🧪 Model Evaluation Practice
👩‍🏫 Academic Projects & Capstone Use

📦 Dataset Size

Rows: 10,000
Columns: 10
File Format: .csv

📚 Context

This dataset was generated to resemble real-world data in academic institutions for research and machine learning use. While it is synthetic, the variables and relationships are crafted to mimic authentic trends observed in student placements.

📜 License

MIT

🔗 Source

Created using Python (NumPy, Pandas) with data logic designed for educational and ML experimentation purposes.

Zomato Dataset

kaggle.com

Updated Mar 8, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Abu Awaish (2025). Zomato Dataset [Dataset]. https://www.kaggle.com/datasets/abuawaish/zomato-dataset/data

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 8, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Abu Awaish

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Zomato Dataset

Overview

This dataset contains information about various restaurants, including their ratings, cuisine types, pricing, and availability of services like online ordering and table booking.

Total Entries (Restaurants): 7,105
Total Columns (Features): 10

Column Descriptions

Column Name	Description
restaurant name	Name of the restaurant.
restaurant type	Type of restaurant (e.g., Quick Bites, Cafe, Casual Dining).
rate (out of 5)	Average rating of the restaurant (out of 5).
num of ratings	Number of people who have rated the restaurant.
avg cost (two people)	Average cost for two people in local currency.
online_order	Whether online ordering is available (`Yes`/`No`).
table booking	Whether table booking is available (`Yes`/`No`).
cuisines type	Types of cuisines served at the restaurant (e.g., Fast Food, Chinese, BBQ).
area	Location area of the restaurant.
local address	Specific address of the restaurant.

Potential Use Cases

Food Recommendation System – Suggest restaurants based on cuisine, ratings, or cost.
Customer Behavior Analysis – Identify trends in online orders, table bookings, and preferred cuisines.
Geographical Insights – Analyze restaurant distribution across different areas.
Price vs. Rating Analysis – Determine if higher prices correlate with better ratings.

How to Use This Dataset

Data Cleaning – Handle missing values, remove duplicate entries.
Data Analysis – Perform exploratory data analysis (EDA) to identify trends.
Visualization – Create plots and graphs to understand restaurant trends.
Machine Learning – Use the data for predictive modeling, such as rating prediction.

Note: This dataset may contain missing values or inconsistencies that require preprocessing before analysis.

Facebook

Twitter

Click to copy link

Link copied

Cite

yvonne gatwiri (2024). house prices data exploration [Dataset]. https://www.kaggle.com/yvonnegatwiri/house-prices-data-exploration/discussion

house prices data exploration

Exploratory Data Analysis (EDA)

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 13, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

yvonne gatwiri

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset

This dataset was created by yvonne gatwiri

Released under Apache 2.0

Clear search

Close search

Google apps

Main menu

house prices data exploration

Dataset

Contents

house prices data exploration

Dataset

Contents

Credit EDA Case Study

Dataset

Contents

Employee Turnover Analytics Dataset

Spark Fund Investment Analysis

Project Brief

Business and Data Understanding

Spark Funds has two minor constraints for investments:

‘Census County Economically Distressed Areas 2018’ analyzed by Analyst-2

HR-attrition-EDA

Context

Content

Inspiration

Univariate and multivariate cox regression models testing associations...

‘Census Block Group Economically Distressed Areas 2018’ analyzed by...

‘COVID-19 dataset in Japan’ analyzed by Analyst-2

1. Context

1.1 Total number of cases in Japan

1.2 The number of cases at prefecture level

1.3 Metadata of each prefecture

2. Acknowledgements

Inspiration

License and how to cite

Baseline characteristics at first RTX infusion in all MS, RRMS and PMS...

Detailed Analysis on campus recruitment

Superstore Orders Analysis-Files

Dataset

Contents

Toy Dataset

Context

Columns

Acknowledgements

COVID 19 Dataset

Context

Content

IMDB Dataset & Dictionary

Covid_19 Full Explo. Data Analysis by DaudKhan

Dataset

Contents

Titanic Dataset - EDA & Logistic Regression

College Student Placement Factors Dataset

📘 College Student Placement Dataset

📄 Dataset Description

📊 Columns Description

🎯 Target Variable

🧠 Use Cases

📦 Dataset Size

📚 Context

📜 License

🔗 Source

Zomato Dataset

Zomato Dataset

Overview

Column Descriptions

Potential Use Cases

How to Use This Dataset

house prices data exploration

Exploratory Data Analysis (EDA)

Dataset

Contents