25 datasets found

T
titanic
tensorflow.org
Updated Feb 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). titanic [Dataset]. https://www.tensorflow.org/datasets/catalog/titanic
Explore at:
Dataset updated
Feb 12, 2023
Description
Dataset describing the survival status of individual passengers on the Titanic. Missing values in the original dataset are represented using ?. Float and int missing values are replaced with -1, string missing values are replaced with 'Unknown'.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('titanic', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
P
Titanic Dataset
paperswithcode.com
Updated Oct 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Titanic Dataset [Dataset]. https://paperswithcode.com/dataset/titanic
Explore at:
Dataset updated
Oct 27, 2024
Description
Titanic Dataset Description Overview The data is divided into two groups: - Training set (train.csv): Used to build machine learning models. It includes the outcome (also called the "ground truth") for each passenger, allowing models to predict survival based on “features” like gender and class. Feature engineering can also be applied to create new features. - Test set (test.csv): Used to evaluate model performance on unseen data. The ground truth is not provided; the task is to predict survival for each passenger in the test set using the trained model.

Additionally, gender_submission.csv is provided as an example submission file, containing predictions based on the assumption that all and only female passengers survive.

Data Dictionary | Variable | Definition | Key | |------------|------------------------------------------|-------------------------------------------------| | survival | Survival | 0 = No, 1 = Yes | | pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd | | sex | Sex | | | age | Age in years | | | sibsp | # of siblings/spouses aboard the Titanic | | | parch | # of parents/children aboard the Titanic | | | ticket | Ticket number | | | fare | Passenger fare | | | cabin | Cabin number | | | embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton |

Variable Notes

pclass: Proxy for socio-economic status (SES): 1st = Upper 2nd = Middle 3rd = Lower age:
Fractional if less than 1 year.
Estimated ages are represented in the form xx.5. sibsp: Defines family relations as: Sibling: Brother, sister, stepbrother, stepsister. Spouse: Husband, wife (excluding mistresses and fiancés). parch: Defines family relations as: Parent: Mother, father. Child: Daughter, son, stepdaughter, stepson. Some children traveled only with a nanny, so parch = 0 for them.
Survival Prediction with Titanic Dataset using R
kaggle.com
Updated Jan 26, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sivasuryanarayan Krishnamoorthy (2018). Survival Prediction with Titanic Dataset using R [Dataset]. https://www.kaggle.com/sivasuryak3/survival-prediction-with-titanic-dataset-using-r/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 26, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sivasuryanarayan Krishnamoorthy
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Sivasuryanarayan Krishnamoorthy

Released under CC0: Public Domain

Contents
o
Data from: Titanic Survival Prediction Dataset
opendatabay.com
.undefined
Updated Jul 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Titanic Survival Prediction Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/fce916ad-b2f1-412d-95f1-e2a50859195e
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 12, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Not Specified
Description
This dataset pertains to the sinking of the RMS Titanic, one of the most infamous shipwrecks in history. On 15 April 1912, during its maiden voyage, the Titanic struck an iceberg and sank, leading to the deaths of 1,502 out of 2,224 passengers and crew due to an insufficient number of lifeboats. While luck played a role, certain groups of people demonstrated a higher likelihood of survival. The primary goal for users of this dataset is to construct a predictive model that identifies the types of individuals who were more likely to survive, utilising passenger details such as name, age, gender, and socio-economic class. Additionally, the objective involves understanding and preparing the dataset, building robust classification models, fine-tuning their hyperparameters, and comparing various algorithm evaluation metrics.

Columns

The dataset contains the following columns: * PassengerId: A unique identifier for each passenger. * Survived: Indicates whether the passenger survived (1) or not (0). * Pclass: The passenger's ticket class (1st, 2nd, or 3rd class). * Name: The full name of the passenger. * Sex: The gender of the passenger (male or female). * Age: The age of the passenger in years. * SibSp: The number of siblings or spouses aboard the Titanic with the passenger. * Parch: The number of parents or children aboard the Titanic with the passenger. * Ticket: The ticket number. * Fare: The passenger's fare. * Cabin: The cabin number. * Embarked: The port from which the passenger embarked (Cherbourg, Queenstown, or Southampton).

Distribution

The dataset is provided as a CSV file named Titanic-Dataset.csv, with a size of 61.19 kB. It features 12 columns. Most columns contain 891 valid records, representing the total number of passengers. However, the 'Age' column has 177 missing values (20%), 'Cabin' has 687 missing values (77%), and 'Embarked' has 2 missing values.

Usage

This dataset is ideally suited for: * Developing classification models to predict passenger survival. * Conducting data clean-up and exploratory data analysis. * Experimenting with hyperparameter tuning for machine learning algorithms. * Comparing the performance of various classification algorithms to determine the most effective predictive approach.

Coverage

The dataset covers passengers and crew involved in the RMS Titanic's maiden voyage on 15 April 1912. The demographic scope includes individuals across different ages, genders, socio-economic classes, and family structures. Geographic relevance is tied to the ports of embarkation: Cherbourg, Queenstown, and Southampton. It should be noted that there are significant gaps in data availability for passenger age (20% missing) and cabin numbers (77% missing).

License

This dataset is under a CC0: Public Domain license.

Who Can Use It

This dataset is highly valuable for: * Machine Learning Engineers: To build, train, and evaluate predictive models. * Data Scientists: For in-depth statistical analysis and feature engineering. * Students and Beginners in Data Science: It is classified as a "Beginner" dataset, making it an excellent resource for learning classification tasks and data pre-processing. * Researchers: Interested in historical data analysis and factors influencing survival in disaster scenarios.

Dataset Name Suggestions

Titanic Survival Prediction Dataset

Titanic Passenger Survival Data

RMS Titanic Survival Analytics

Historical Titanic Survival Factors

Attributes

Original Data Source: Titanic Survival Prediction Dataset
A
‘Titanic Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-dataset-bec7/bfa18318/?iid=006-936&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Titanic Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yasserh/titanic-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

https://raw.githubusercontent.com/Masterx-AI/Project_Titanic_Survival_Prediction_/main/titanic.jpg" alt="">

Description:

The sinking of the Titanic is one of the most infamous shipwrecks in history.

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone on board, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

Acknowledgements:

This dataset has been referred from Kaggle: https://www.kaggle.com/c/titanic/data.

Objective:

Understand the Dataset & cleanup (if required).

Build a strong classification model to predict whether the passenger survives or not.

Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.

--- Original source retains full ownership of the source dataset ---
Titanic Dataset - cleaned
kaggle.com
Updated Aug 9, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WinstonSDodson (2019). Titanic Dataset - cleaned [Dataset]. https://www.kaggle.com/datasets/winstonsdodson/titanic-dataset-cleaned/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
WinstonSDodson
Description
This is the classic Titanic Dataset provided in the Kaggle Competition K Kernel and then cleaned in one of the most popular Kernels there. Please see the Kernel titled, "A Data Science Framework: To Achieve 99% Accuracy" for a great lesson in data science. This Kernel gives a great explanaton of the thinking behind the of this data cleaning as well as a very professional demonstration of the technologies and skills to do so. It then continues to provide an overview of many ML techniques and it is copiously and meticulously documented with many useful citations.

Of course, data cleaning is an essential skill in data science but I wanted to use this data for a study of other machine learning techniques. So, I found and used this set of data that is well known and cleaned to a benchmark accepted by many.
A
‘Titanic Solution for Beginner's Guide’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic Solution for Beginner's Guide’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-solution-for-beginner-s-guide-03a8/ae3641d4/?iid=014-162&v=presentation
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Titanic Solution for Beginner's Guide’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harunshimanto/titanic-solution-for-beginners-guide on 14 February 2022.

--- Dataset description provided by original source is as follows ---

Overview

The data has been split into two groups:

training set (train.csv) test set (test.csv)

The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.

The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.

We also include gender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.

Data Dictionary

Variable Definition Key survival Survival 0 = No, 1 = Yes pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd sex Sex
Age Age in years
sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the Titanic
ticket Ticket number
fare Passenger fare
cabin Cabin number
embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

Variable Notes

pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower

age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)

parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.

--- Original source retains full ownership of the source dataset ---
Titanic Dataset
kaggle.com
Updated Dec 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M Yasser H (2021). Titanic Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/titanic-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 24, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
M Yasser H
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://raw.githubusercontent.com/Masterx-AI/Project_Titanic_Survival_Prediction_/main/titanic.jpg" alt="">

Description:

The sinking of the Titanic is one of the most infamous shipwrecks in history.

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone on board, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

Acknowledgements:

This dataset has been referred from Kaggle: https://www.kaggle.com/c/titanic/data.

Objective:

Understand the Dataset & cleanup (if required).

Build a strong classification model to predict whether the passenger survives or not.

Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.
A well Cleaned dataset using Excel
kaggle.com
Updated Feb 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MICADEE (2020). A well Cleaned dataset using Excel [Dataset]. https://www.kaggle.com/adegladius/a-well-cleaned-dataset-using-excel/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
MICADEE
Description
Dataset

This dataset was created by MICADEE

Contents
A
‘Titanic Dataset Analysis’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic Dataset Analysis’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-dataset-analysis-c0ba/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Titanic Dataset Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/cities/titanic123 on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

There's a story behind every dataset and here's your opportunity to share yours.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?

--- Original source retains full ownership of the source dataset ---
A
‘Titanic: cleaned data’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Titanic: cleaned data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-cleaned-data-cbf4/dc9cd7ff/?iid=055-046&v=presentation
Explore at:
Dataset updated
Sep 30, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Titanic: cleaned data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jamesleslie/titanic-cleaned-data on 30 September 2021.

--- Dataset description provided by original source is as follows ---

Introduction

This dataset was created in this notebook as part of a three-part series. The data is in machine-learning-ready format, with all missing values for the Age, Fare and Embarked columns having been imputed.

Data imputation

Age: this column was imputed by using the median age for the passenger's title (Mr, Mrs, Dr etc).

Fare: the single missing value in this column was imputed using the median value for that passenger's class.

Embarked: the two missing values here were imputed using the Pandas backfill method.

Usage

This data is used in both the second and third parts of the series.

--- Original source retains full ownership of the source dataset ---
P
titanic5 Dataset Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
titanic5 Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/titanic5-dataset
Explore at:
Description
titanic5 Dataset Created by David Beltran del Rio March 2016.

Notes This is the final (for now) version of my update to the Titanic data. I think it’s finally ready for publishing if you’d like. What I did was to strip all the passenger and crew data from the Encyclopedia Titanica (ET) web pages (excluding channel crossing passengers), create a unique ID for each passenger and crew member (Name_ID), then (painstakingly and hopefully 100% correctly) match to your earlier titanic3 dataset, in order to compare the two and to get your sibsp and parch variables. Since the ET is updated occasionally the work put into the ID and matching can be reused and refined later. I did eventually hear back from the ET people, they are willing to make the underlying database available in the future, I have not yet taken them up on it.

The two datasets line up nicely, most of the differences in the newer titanic5 dataset are in the age variable, as I had mentioned before - the new set has less missing ages - 51 missing (vs 263) out of 1309.

I am in the process of refining my analysis of the data as well, based on your comments below and your Regression Modeling Strategies example.

titanic3_wID data can be matched to titanic5 using the Name_ID variable. Tab titanic5 Metadata has the variable descriptions and allowable values for Class and Class/Dept.

A note about the ages - instead of using the add 0.5 trick to indicate estimated birth day / date I have a flag that indicates how the “final” age (Age_F) was arrived at. It’s the Age_F_Code variable - the allowable values are in the Titanic5_metadata tab in the attached excel. The reason for this is that I already had some fractional ages for infants where I had age in months instead of years and I wanted to avoid confusion for 6 month old infants, although I don’t think there are any in the data! Also, I was thinking to make fractional ages or age in days for all passengers for whom I have DoB, but I have not yet done so.

Here’s what the tabs are:

Titanic5_all - all (mostly cleaned) Titanic passenger and crew records Titanic5_work - working dataset, crew removed, unnecessary variables removed - this is the one I import into SAS / R to work on Titanic5_metadata - Variable descriptions and allowable values titanic3_wID - Original Titanic3 dataset with Name_ID added for merging to Titanic5 I have a csv, R dataset, and SAS dataset, but the variable names are an older version, so I won’t send those along for now to avoid confusion.

If it helps send my contact info along to your student in case any questions arise. Gmail address probably best, on weekends for sure: davebdr@gmail.com

The tabs in titanic5.xls are

Titanic5_all Titanic5_passenger (the one to be used for analysis) Titanic5_metadata (used during analysis file creation) Titanic3_wID
Titanic Dataset - EDA & Logistic Regression
kaggle.com
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RabbiTheAnalyst (2025). Titanic Dataset - EDA & Logistic Regression [Dataset]. https://www.kaggle.com/datasets/mdrabbiali/titanic-data-set/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 19, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
RabbiTheAnalyst
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Description The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone on board, resulting in the death of 1502 out of 2224 passengers and crew. While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others. In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

Objective:

Survival Prediction: To build a logistic regression model that accurately predicts the survival of passengers based on features such as age, gender, passenger class, and number of siblings/spouses aboard.

Data Cleaning and Preprocessing:To perform data cleaning by handling missing values, removing unnecessary columns, and encoding categorical variables to prepare the dataset for analysis.

Exploratory Data Analysis (EDA): To conduct a thorough exploratory data analysis to visualize survival rates and identify patterns based on various factors like gender, passenger class, and embarked location.

Feature Importance Analysis: To analyze the correlation between different features and their impact on survival rates, identifying which factors are the most significant predictors of survival.

Model Evaluation: To evaluate the performance of the logistic regression model using accuracy scores and classification reports, ensuring that the model generalizes well to unseen data.

ROC Curve Analysis: To create a ROC curve to assess the trade-off between the true positive rate and false positive rate, providing insights into the model's ability to distinguish between survivors and non-survivors.

Insights and Recommendations: To derive insights from the analysis that could inform future safety measures or policies related to passenger safety in maritime travel.
w
Dataset of books called Titanic science : where the story of Titanic meets...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Titanic science : where the story of Titanic meets science enquiry [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Titanic+science+%3A+where+the+story+of+Titanic+meets+science+enquiry
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is Titanic science : where the story of Titanic meets science enquiry. It features 7 columns including author, publication date, language, and book publisher.
TITANIC DATASET USING DECISION TREE CLASSIFIER
kaggle.com
Updated Oct 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nikhil reddy vantari (2023). TITANIC DATASET USING DECISION TREE CLASSIFIER [Dataset]. https://www.kaggle.com/nikhilreddyvantari/titanic-dataset-using-decision-tree-classifier/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 14, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
nikhil reddy vantari
Description
Dataset

This dataset was created by nikhil reddy vantari

Contents
Titanic Data analysis using LR,DT,SVM,RF
kaggle.com
Updated Jun 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
amitkhan (2020). Titanic Data analysis using LR,DT,SVM,RF [Dataset]. https://www.kaggle.com/datasets/amojas00/titanic-data-analysis-using-lrdtsvmrf/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 15, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
amitkhan
Description
Dataset

This dataset was created by amitkhan

Contents
The Complete Titanic Dataset
kaggle.com
Updated Jan 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vinicius Barbosa Paiva (2020). The Complete Titanic Dataset [Dataset]. https://www.kaggle.com/vinicius150987/titanic3/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 4, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vinicius Barbosa Paiva
Description
The sinking of the Titanic is one of the most infamous shipwrecks in history.

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).
Titanic Solution for Beginner's Guide
kaggle.com
Updated Mar 12, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harun-Ur-Rashid (2018). Titanic Solution for Beginner's Guide [Dataset]. https://www.kaggle.com/harunshimanto/titanic-solution-for-beginners-guide/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 12, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Harun-Ur-Rashid
Description
Overview

The data has been split into two groups:

training set (train.csv) test set (test.csv)

The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.

The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.

We also include gender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.

Data Dictionary

Variable Definition Key survival Survival 0 = No, 1 = Yes pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd sex Sex
Age Age in years
sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the Titanic
ticket Ticket number
fare Passenger fare
cabin Cabin number
embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

Variable Notes

pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower

age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)

parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.
Titanic Dataset Competition
kaggle.com
Updated Dec 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cynthia Barasa (2022). Titanic Dataset Competition [Dataset]. https://www.kaggle.com/datasets/cynthycynthy/titanicdataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 19, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Cynthia Barasa
Description
The Titanic dataset is a well-known dataset that provides information on the passengers who were onboard the fateful voyage of the RMS Titanic. The data includes details such as the passenger's name, age, gender, ticket class, fare paid, and information on their family members. The dataset also includes a column called "Survived" which indicates whether a passenger survived the disaster or not.

There are a total of 891 rows in the dataset, with 12 columns. Some of the key columns in the dataset include:

• PassengerId: a unique identifier for each passenger • Survived: a binary variable that indicates whether the passenger survived (1) or did not survive (0) the disaster • Pclass: the ticket class of the passenger (1 = first class, 2 = second class, 3 = third class) • Name: the name of the passenger • Sex: the gender of the passenger (male or female) • Age: the age of the passenger (some values are missing) • SibSp: the number of siblings or spouses the passenger had on board • Parch: the number of parents or children the passenger had on board • Ticket: the ticket number of the passenger • Fare: the fare paid by the passenger • Cabin: the cabin number of the passenger (some values are missing) • Embarked: the port at which the passenger embarked (C = Cherbourg, Q = Queenstown, S = Southampton)

Overall, the key challenges I encountered when working on the Titanic dataset were: how to handle missing values and imbalanced classes, encode categorical variables, reduce the dimensionality of the dataset, and identify and handle noise in the data.

Here are a few tips and resources that I found helpful when getting started in the Titanic dataset competition: 1. Get familiar with the dataset 2. Pre-process the data 3. Split the data into training and test sets 4. Try out a few different algorithms 5. Tune the hyper parameters 6. Evaluate the model

Here are a few resources that I found helpful as I started Working on the competition: • Kaggle's Titanic tutorial • scikit-learn documentation. • Pandas documentation
Titanic survive model
kaggle.com
Updated Jan 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aly El-badry (2025). Titanic survive model [Dataset]. https://www.kaggle.com/datasets/alyelbadry/titanic-survive-model/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 27, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aly El-badry
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Titanic Passenger Data

This dataset contains information about the passengers aboard the RMS Titanic, which tragically sank during its maiden voyage in April 1912. It provides detailed data points for survival analysis and predictive modeling, including demographic details, ticket class, fare, and survival outcomes.

Dataset Highlights:

Passenger Details: Information such as Name, Age, Gender, and Embarked Port.

Socioeconomic Status: Passenger Class (1st, 2nd, or 3rd) and Fare Price.

Survival Information: Whether the passenger survived or perished.

Family Relationships: Number of Siblings/Spouses and Parents/Children aboard.

This dataset is ideal for exploring patterns of survival, understanding social dynamics aboard the Titanic, and testing machine learning models for classification problems.

Suggested Use Cases:

Survival rate analysis by age, gender, or class.

Building classification models for survival prediction.

Statistical tests and hypothesis exploration on historical data.

Dive in to analyze one of the most famous shipwrecks in history!

Facebook

Twitter

Click to copy link

Link copied

Cite

(2023). titanic [Dataset]. https://www.tensorflow.org/datasets/catalog/titanic

titanic

Explore at:

Dataset updated

Feb 12, 2023

Description

Dataset describing the survival status of individual passengers on the Titanic. Missing values in the original dataset are represented using ?. Float and int missing values are replaced with -1, string missing values are replaced with 'Unknown'.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('titanic', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Clear search

Close search

Google apps

Main menu

titanic

Titanic Dataset

Survival Prediction with Titanic Dataset using R

Dataset

Contents

Data from: Titanic Survival Prediction Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

‘Titanic Dataset’ analyzed by Analyst-2

Description:

Acknowledgements:

Objective:

Titanic Dataset - cleaned

‘Titanic Solution for Beginner's Guide’ analyzed by Analyst-2

Overview

Data Dictionary

Variable Notes

Titanic Dataset

Description:

Acknowledgements:

Objective:

A well Cleaned dataset using Excel

Dataset

Contents

‘Titanic Dataset Analysis’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

‘Titanic: cleaned data’ analyzed by Analyst-2

Introduction

Data imputation

Usage

titanic5 Dataset Dataset

Titanic Dataset - EDA & Logistic Regression

Dataset of books called Titanic science : where the story of Titanic meets...

TITANIC DATASET USING DECISION TREE CLASSIFIER

Dataset

Contents

Titanic Data analysis using LR,DT,SVM,RF

Dataset

Contents

The Complete Titanic Dataset

Titanic Solution for Beginner's Guide

Overview

Data Dictionary

Variable Notes

Titanic Dataset Competition

Titanic survive model

Titanic Passenger Data

Dataset Highlights:

Suggested Use Cases:

titanic