Titanic - Machine Learning from Disaster (Kaggle competition). This dataset contains datafiles for the Notebook Titanic/Kaggle -Full analysis šµš½, by Fernando Meneses. It includes: training and testing datasets, the solution file, Leaderboard statistics and pre-trained results.
Description šš³ļø Ahoy, welcome to Kaggle! Youāre in the right place. This is the legendary Titanic ML competition ā the best, first challenge for you to dive into ML competitions and familiarize yourself with how the Kaggle platform works.
If you want to talk with other users about this competition, come join our Discord! We've got channels for competitions, job postings and career discussions, resources, and socializing with your fellow data scientists. Follow the link here: https://discord.gg/kaggle
The competition is simple: use machine learning to create a model that predicts which passengers survived the Titanic shipwreck.
Read on or watch the video below to explore more details. Once youāre ready to start competing, click on the "Join Competition button to create an account and gain access to the competition data. Then check out Alexis Cookās Titanic Tutorial that walks you through step by step how to make your first submission!
The Challenge The sinking of the Titanic is one of the most infamous shipwrecks in history.
On April 15, 1912, during her maiden voyage, the widely considered āunsinkableā RMS Titanic sank after colliding with an iceberg. Unfortunately, there werenāt enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.
While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.
In this challenge, we ask you to build a predictive model that answers the question: āwhat sorts of people were more likely to survive?ā using passenger data (ie name, age, gender, socio-economic class, etc).
Recommended Tutorial We highly recommend Alexis Cookās Titanic Tutorial that walks you through making your very first submission step by step and this starter notebook to get started.
How Kaggleās Competitions Work Join the Competition Read about the challenge description, accept the Competition Rules and gain access to the competition dataset. Get to Work Download the data, build models on it locally or on Kaggle Notebooks (our no-setup, customizable Jupyter Notebooks environment with free GPUs) and generate a prediction file. Make a Submission Upload your prediction as a submission on Kaggle and receive an accuracy score. Check the Leaderboard See how your model ranks against other Kagglers on our leaderboard. Improve Your Score Check out the discussion forum to find lots of tutorials and insights from other competitors. Kaggle Lingo Video You may run into unfamiliar lingo as you dig into the Kaggle discussion forums and public notebooks. Check out Dr. Rachael Tatmanās video on Kaggle Lingo to get up to speed!
What Data Will I Use in This Competition? In this competition, youāll gain access to two similar datasets that include passenger information like name, age, gender, socio-economic class, etc. One dataset is titled train.csv and the other is titled test.csv.
Train.csv will contain the details of a subset of the passengers on board (891 to be exact) and importantly, will reveal whether they survived or not, also known as the āground truthā.
The test.csv dataset contains similar information but does not disclose the āground truthā for each passenger. Itās your job to predict these outcomes.
Using the patterns you find in the train.csv data, predict whether the other 418 passengers on board (found in test.csv) survived.
Check out the āDataā tab to explore the datasets even further. Once you feel youāve created a competitive model, submit it to Kaggle to see where your model stands on our leaderboard against other Kagglers.
How to Submit your Prediction to Kaggle Once youāre ready to make a submission and get on the leaderboard:
Click on the āSubmit Predictionsā button
Upload a CSV file in the submission file format. Youāre able to submit 10 submissions a day.
Submission File Format: You should submit a csv file with exactly 418 entries plus a header row. Your submission will show an error if you have extra columns (beyond PassengerId and Survived) or rows.
The file should have exactly 2 columns:
PassengerId (sorted in any order) Survived (contains your binary predictions: 1 for survived, 0 for deceased) Got it! Iām ready to get started. Where do I get help if I need it? For Competition Help: Titanic Discussion Forum Kaggle doesnāt have a dedicated team to help troubleshoot your code so youāll typically find that you receive a response more quickly by asking your question in the appropriate forum. The forums are full of useful information on the data, metric, and different approaches. We encourage you to use the forums often. If you share your knowledge, you'll find that others will share a lot in turn!
A Last Word on Kaggle Notebooks As we mentioned before, Kaggle Notebooks is our no-setup, customizable, Jupyter Notebooks environment with free GPUs and a huge repository ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of āTitanic: cleaned dataā provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jamesleslie/titanic-cleaned-data on 30 September 2021.
--- Dataset description provided by original source is as follows ---
This dataset was created in this notebook as part of a three-part series. The data is in machine-learning-ready format, with all missing values for the Age
, Fare
and Embarked
columns having been imputed.
Age
: this column was imputed by using the median age for the passenger's title (Mr, Mrs, Dr etc).Fare
: the single missing value in this column was imputed using the median value for that passenger's class.Embarked
: the two missing values here were imputed using the Pandas backfill
method.This data is used in both the second and third parts of the series.
--- Original source retains full ownership of the source dataset ---
Data set was created by preprocessing (filling lost data, extracting new features) of Titanic - Machine Learning Disaster data set.
Using this processed data set, the machine learning models can be applied directly.
You can see preprocessing step in notebook: https://www.kaggle.com/fethiye/titanic-predict-survival-prediction
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by MHouellemont
Released under CC0: Public Domain
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset used in my š³ļø Titanic - Top 1% with KNN [0.81818] notebook. It contains all the leaderboard's entries from the Titanic - Machine Learning from Disaster competition in March 2023.
This dataset was created by diogo22santos
Released under Other (specified in description)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Patil
Released under CC0: Public Domain
It contains the following files:
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by karthik
Released under Database: Open Database, Contents: Database Contents
Titanic Dataset for checking the chance of survivals among passengers travelling during the same time.
How priorities are set for the people saved on board ? What factors responsible for the ship to sink ?
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Inspired by Wang et al. 2021, we decided to utilize the top-voted and well-documented Kaggle notebooks to construct the notebookCDGdataset We collected the top 10% highly-voted notebooks from the top 20 popular competitions on Kaggle (e.g. Titanic). We checked the data policy of each of the 20 competitions, none of them has copyright issues. We also contacted the Kaggle administrators to make sure our data collection complies with the platformās policy. In total, we collected 3,944 notebooks as raw data. After data preprocessing, the final dataset contains 2,476 notebooks out of the 3,944 notebooks from the raw data. It has 28,625 codeādocumentation pairs. The overall code-to-markdown ratio is 2.2195
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a pickle file of the model used in the notebook "Titanic: some sex, a bit of class, and a tree..." for use in the notebook All in a pickle: Saving the Titanic.
This dataset was created by DavidS
It contains the following files:
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains cleaned Titanic passenger data for EDA and machine learning tasks. Includes features like age, sex, class, fare, and family details. Ideal for survival prediction and beginner ML projects.
š Great for:
Feature engineering
Data visualization
Classification modeling
š Both train and test sets included.
š¬ If you find this dataset helpful, please upvote and share your notebook!
This dataset was created by Sawan Kumar Yadav
This dataset was created by ASAD KAREL
It contains the following files:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Titanic - Machine Learning from Disaster (Kaggle competition). This dataset contains datafiles for the Notebook Titanic/Kaggle -Full analysis šµš½, by Fernando Meneses. It includes: training and testing datasets, the solution file, Leaderboard statistics and pre-trained results.