25 datasets found
  1. T

    titanic

    • tensorflow.org
    Updated Feb 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). titanic [Dataset]. https://www.tensorflow.org/datasets/catalog/titanic
    Explore at:
    Dataset updated
    Feb 12, 2023
    Description

    Dataset describing the survival status of individual passengers on the Titanic. Missing values in the original dataset are represented using ?. Float and int missing values are replaced with -1, string missing values are replaced with 'Unknown'.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('titanic', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  2. P

    Titanic Dataset

    • paperswithcode.com
    Updated Oct 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Titanic Dataset [Dataset]. https://paperswithcode.com/dataset/titanic
    Explore at:
    Dataset updated
    Oct 27, 2024
    Description

    Titanic Dataset Description Overview The data is divided into two groups: - Training set (train.csv): Used to build machine learning models. It includes the outcome (also called the "ground truth") for each passenger, allowing models to predict survival based on “features” like gender and class. Feature engineering can also be applied to create new features. - Test set (test.csv): Used to evaluate model performance on unseen data. The ground truth is not provided; the task is to predict survival for each passenger in the test set using the trained model.

    Additionally, gender_submission.csv is provided as an example submission file, containing predictions based on the assumption that all and only female passengers survive.

    Data Dictionary | Variable | Definition | Key | |------------|------------------------------------------|-------------------------------------------------| | survival | Survival | 0 = No, 1 = Yes | | pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd | | sex | Sex | | | age | Age in years | | | sibsp | # of siblings/spouses aboard the Titanic | | | parch | # of parents/children aboard the Titanic | | | ticket | Ticket number | | | fare | Passenger fare | | | cabin | Cabin number | | | embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton |

    Variable Notes

    pclass: Proxy for socio-economic status (SES): 1st = Upper 2nd = Middle 3rd = Lower age:
    Fractional if less than 1 year.
    Estimated ages are represented in the form xx.5. sibsp: Defines family relations as: Sibling: Brother, sister, stepbrother, stepsister. Spouse: Husband, wife (excluding mistresses and fiancés). parch: Defines family relations as: Parent: Mother, father. Child: Daughter, son, stepdaughter, stepson. Some children traveled only with a nanny, so parch = 0 for them.

  3. Survival Prediction with Titanic Dataset using R

    • kaggle.com
    Updated Jan 26, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sivasuryanarayan Krishnamoorthy (2018). Survival Prediction with Titanic Dataset using R [Dataset]. https://www.kaggle.com/sivasuryak3/survival-prediction-with-titanic-dataset-using-r/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 26, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sivasuryanarayan Krishnamoorthy
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Sivasuryanarayan Krishnamoorthy

    Released under CC0: Public Domain

    Contents

  4. o

    Data from: Titanic Survival Prediction Dataset

    • opendatabay.com
    .undefined
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Titanic Survival Prediction Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/fce916ad-b2f1-412d-95f1-e2a50859195e
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 12, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Not Specified
    Description

    This dataset pertains to the sinking of the RMS Titanic, one of the most infamous shipwrecks in history. On 15 April 1912, during its maiden voyage, the Titanic struck an iceberg and sank, leading to the deaths of 1,502 out of 2,224 passengers and crew due to an insufficient number of lifeboats. While luck played a role, certain groups of people demonstrated a higher likelihood of survival. The primary goal for users of this dataset is to construct a predictive model that identifies the types of individuals who were more likely to survive, utilising passenger details such as name, age, gender, and socio-economic class. Additionally, the objective involves understanding and preparing the dataset, building robust classification models, fine-tuning their hyperparameters, and comparing various algorithm evaluation metrics.

    Columns

    The dataset contains the following columns: * PassengerId: A unique identifier for each passenger. * Survived: Indicates whether the passenger survived (1) or not (0). * Pclass: The passenger's ticket class (1st, 2nd, or 3rd class). * Name: The full name of the passenger. * Sex: The gender of the passenger (male or female). * Age: The age of the passenger in years. * SibSp: The number of siblings or spouses aboard the Titanic with the passenger. * Parch: The number of parents or children aboard the Titanic with the passenger. * Ticket: The ticket number. * Fare: The passenger's fare. * Cabin: The cabin number. * Embarked: The port from which the passenger embarked (Cherbourg, Queenstown, or Southampton).

    Distribution

    The dataset is provided as a CSV file named Titanic-Dataset.csv, with a size of 61.19 kB. It features 12 columns. Most columns contain 891 valid records, representing the total number of passengers. However, the 'Age' column has 177 missing values (20%), 'Cabin' has 687 missing values (77%), and 'Embarked' has 2 missing values.

    Usage

    This dataset is ideally suited for: * Developing classification models to predict passenger survival. * Conducting data clean-up and exploratory data analysis. * Experimenting with hyperparameter tuning for machine learning algorithms. * Comparing the performance of various classification algorithms to determine the most effective predictive approach.

    Coverage

    The dataset covers passengers and crew involved in the RMS Titanic's maiden voyage on 15 April 1912. The demographic scope includes individuals across different ages, genders, socio-economic classes, and family structures. Geographic relevance is tied to the ports of embarkation: Cherbourg, Queenstown, and Southampton. It should be noted that there are significant gaps in data availability for passenger age (20% missing) and cabin numbers (77% missing).

    License

    This dataset is under a CC0: Public Domain license.

    Who Can Use It

    This dataset is highly valuable for: * Machine Learning Engineers: To build, train, and evaluate predictive models. * Data Scientists: For in-depth statistical analysis and feature engineering. * Students and Beginners in Data Science: It is classified as a "Beginner" dataset, making it an excellent resource for learning classification tasks and data pre-processing. * Researchers: Interested in historical data analysis and factors influencing survival in disaster scenarios.

    Dataset Name Suggestions

    • Titanic Survival Prediction Dataset
    • Titanic Passenger Survival Data
    • RMS Titanic Survival Analytics
    • Historical Titanic Survival Factors

    Attributes

    Original Data Source: Titanic Survival Prediction Dataset

  5. A

    ‘Titanic Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-dataset-bec7/bfa18318/?iid=006-936&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Titanic Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yasserh/titanic-dataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    https://raw.githubusercontent.com/Masterx-AI/Project_Titanic_Survival_Prediction_/main/titanic.jpg" alt="">

    Description:

    The sinking of the Titanic is one of the most infamous shipwrecks in history.

    On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone on board, resulting in the death of 1502 out of 2224 passengers and crew.

    While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

    In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

    Acknowledgements:

    This dataset has been referred from Kaggle: https://www.kaggle.com/c/titanic/data.

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build a strong classification model to predict whether the passenger survives or not.
    • Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.

    --- Original source retains full ownership of the source dataset ---

  6. Titanic Dataset - cleaned

    • kaggle.com
    Updated Aug 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WinstonSDodson (2019). Titanic Dataset - cleaned [Dataset]. https://www.kaggle.com/datasets/winstonsdodson/titanic-dataset-cleaned/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    WinstonSDodson
    Description

    This is the classic Titanic Dataset provided in the Kaggle Competition K Kernel and then cleaned in one of the most popular Kernels there. Please see the Kernel titled, "A Data Science Framework: To Achieve 99% Accuracy" for a great lesson in data science. This Kernel gives a great explanaton of the thinking behind the of this data cleaning as well as a very professional demonstration of the technologies and skills to do so. It then continues to provide an overview of many ML techniques and it is copiously and meticulously documented with many useful citations.

    Of course, data cleaning is an essential skill in data science but I wanted to use this data for a study of other machine learning techniques. So, I found and used this set of data that is well known and cleaned to a benchmark accepted by many.

  7. A

    ‘Titanic Solution for Beginner's Guide’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic Solution for Beginner's Guide’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-solution-for-beginner-s-guide-03a8/ae3641d4/?iid=014-162&v=presentation
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Titanic Solution for Beginner's Guide’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harunshimanto/titanic-solution-for-beginners-guide on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Overview

    The data has been split into two groups:

    training set (train.csv)
    test set (test.csv)
    

    The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.

    The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.

    We also include gender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.

    Data Dictionary

    Variable Definition Key survival Survival 0 = No, 1 = Yes pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd sex Sex
    Age Age in years
    sibsp # of siblings / spouses aboard the Titanic
    parch # of parents / children aboard the Titanic
    ticket Ticket number
    fare Passenger fare
    cabin Cabin number
    embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

    Variable Notes

    pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower

    age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

    sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)

    parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.

    --- Original source retains full ownership of the source dataset ---

  8. Titanic Dataset

    • kaggle.com
    Updated Dec 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Yasser H (2021). Titanic Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/titanic-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 24, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    M Yasser H
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://raw.githubusercontent.com/Masterx-AI/Project_Titanic_Survival_Prediction_/main/titanic.jpg" alt="">

    Description:

    The sinking of the Titanic is one of the most infamous shipwrecks in history.

    On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone on board, resulting in the death of 1502 out of 2224 passengers and crew.

    While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

    In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

    Acknowledgements:

    This dataset has been referred from Kaggle: https://www.kaggle.com/c/titanic/data.

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build a strong classification model to predict whether the passenger survives or not.
    • Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.
  9. A well Cleaned dataset using Excel

    • kaggle.com
    Updated Feb 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MICADEE (2020). A well Cleaned dataset using Excel [Dataset]. https://www.kaggle.com/adegladius/a-well-cleaned-dataset-using-excel/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    MICADEE
    Description

    Dataset

    This dataset was created by MICADEE

    Contents

  10. A

    ‘Titanic Dataset Analysis’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic Dataset Analysis’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-dataset-analysis-c0ba/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Titanic Dataset Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/cities/titanic123 on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    There's a story behind every dataset and here's your opportunity to share yours.

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

    --- Original source retains full ownership of the source dataset ---

  11. A

    ‘Titanic: cleaned data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Titanic: cleaned data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-cleaned-data-cbf4/dc9cd7ff/?iid=055-046&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Titanic: cleaned data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jamesleslie/titanic-cleaned-data on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Introduction

    This dataset was created in this notebook as part of a three-part series. The data is in machine-learning-ready format, with all missing values for the Age, Fare and Embarked columns having been imputed.

    Data imputation

    • Age: this column was imputed by using the median age for the passenger's title (Mr, Mrs, Dr etc).
    • Fare: the single missing value in this column was imputed using the median value for that passenger's class.
    • Embarked: the two missing values here were imputed using the Pandas backfill method.

    Usage

    This data is used in both the second and third parts of the series.

    --- Original source retains full ownership of the source dataset ---

  12. P

    titanic5 Dataset Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    titanic5 Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/titanic5-dataset
    Explore at:
    Description

    titanic5 Dataset Created by David Beltran del Rio March 2016.

    Notes This is the final (for now) version of my update to the Titanic data. I think it’s finally ready for publishing if you’d like. What I did was to strip all the passenger and crew data from the Encyclopedia Titanica (ET) web pages (excluding channel crossing passengers), create a unique ID for each passenger and crew member (Name_ID), then (painstakingly and hopefully 100% correctly) match to your earlier titanic3 dataset, in order to compare the two and to get your sibsp and parch variables. Since the ET is updated occasionally the work put into the ID and matching can be reused and refined later. I did eventually hear back from the ET people, they are willing to make the underlying database available in the future, I have not yet taken them up on it.

    The two datasets line up nicely, most of the differences in the newer titanic5 dataset are in the age variable, as I had mentioned before - the new set has less missing ages - 51 missing (vs 263) out of 1309.

    I am in the process of refining my analysis of the data as well, based on your comments below and your Regression Modeling Strategies example.

    titanic3_wID data can be matched to titanic5 using the Name_ID variable. Tab titanic5 Metadata has the variable descriptions and allowable values for Class and Class/Dept.

    A note about the ages - instead of using the add 0.5 trick to indicate estimated birth day / date I have a flag that indicates how the “final” age (Age_F) was arrived at. It’s the Age_F_Code variable - the allowable values are in the Titanic5_metadata tab in the attached excel. The reason for this is that I already had some fractional ages for infants where I had age in months instead of years and I wanted to avoid confusion for 6 month old infants, although I don’t think there are any in the data! Also, I was thinking to make fractional ages or age in days for all passengers for whom I have DoB, but I have not yet done so.

    Here’s what the tabs are:

    Titanic5_all - all (mostly cleaned) Titanic passenger and crew records Titanic5_work - working dataset, crew removed, unnecessary variables removed - this is the one I import into SAS / R to work on Titanic5_metadata - Variable descriptions and allowable values titanic3_wID - Original Titanic3 dataset with Name_ID added for merging to Titanic5 I have a csv, R dataset, and SAS dataset, but the variable names are an older version, so I won’t send those along for now to avoid confusion.

    If it helps send my contact info along to your student in case any questions arise. Gmail address probably best, on weekends for sure: davebdr@gmail.com

    The tabs in titanic5.xls are

    Titanic5_all Titanic5_passenger (the one to be used for analysis) Titanic5_metadata (used during analysis file creation) Titanic3_wID

  13. Titanic Dataset - EDA & Logistic Regression

    • kaggle.com
    Updated Feb 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RabbiTheAnalyst (2025). Titanic Dataset - EDA & Logistic Regression [Dataset]. https://www.kaggle.com/datasets/mdrabbiali/titanic-data-set/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 19, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    RabbiTheAnalyst
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Description The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone on board, resulting in the death of 1502 out of 2224 passengers and crew. While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others. In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

    Objective:

    1. Survival Prediction: To build a logistic regression model that accurately predicts the survival of passengers based on features such as age, gender, passenger class, and number of siblings/spouses aboard.

    2. Data Cleaning and Preprocessing:To perform data cleaning by handling missing values, removing unnecessary columns, and encoding categorical variables to prepare the dataset for analysis.

    3. Exploratory Data Analysis (EDA): To conduct a thorough exploratory data analysis to visualize survival rates and identify patterns based on various factors like gender, passenger class, and embarked location.

    4. Feature Importance Analysis: To analyze the correlation between different features and their impact on survival rates, identifying which factors are the most significant predictors of survival.

    5. Model Evaluation: To evaluate the performance of the logistic regression model using accuracy scores and classification reports, ensuring that the model generalizes well to unseen data.

    6. ROC Curve Analysis: To create a ROC curve to assess the trade-off between the true positive rate and false positive rate, providing insights into the model's ability to distinguish between survivors and non-survivors.

    7. Insights and Recommendations: To derive insights from the analysis that could inform future safety measures or policies related to passenger safety in maritime travel.

  14. w

    Dataset of books called Titanic science : where the story of Titanic meets...

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Titanic science : where the story of Titanic meets science enquiry [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Titanic+science+%3A+where+the+story+of+Titanic+meets+science+enquiry
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Titanic science : where the story of Titanic meets science enquiry. It features 7 columns including author, publication date, language, and book publisher.

  15. TITANIC DATASET USING DECISION TREE CLASSIFIER

    • kaggle.com
    Updated Oct 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nikhil reddy vantari (2023). TITANIC DATASET USING DECISION TREE CLASSIFIER [Dataset]. https://www.kaggle.com/nikhilreddyvantari/titanic-dataset-using-decision-tree-classifier/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 14, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    nikhil reddy vantari
    Description

    Dataset

    This dataset was created by nikhil reddy vantari

    Contents

  16. Titanic Data analysis using LR,DT,SVM,RF

    • kaggle.com
    Updated Jun 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    amitkhan (2020). Titanic Data analysis using LR,DT,SVM,RF [Dataset]. https://www.kaggle.com/datasets/amojas00/titanic-data-analysis-using-lrdtsvmrf/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    amitkhan
    Description

    Dataset

    This dataset was created by amitkhan

    Contents

  17. The Complete Titanic Dataset

    • kaggle.com
    Updated Jan 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vinicius Barbosa Paiva (2020). The Complete Titanic Dataset [Dataset]. https://www.kaggle.com/vinicius150987/titanic3/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 4, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vinicius Barbosa Paiva
    Description

    The sinking of the Titanic is one of the most infamous shipwrecks in history.

    On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.

    While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

    In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

  18. Titanic Solution for Beginner's Guide

    • kaggle.com
    Updated Mar 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harun-Ur-Rashid (2018). Titanic Solution for Beginner's Guide [Dataset]. https://www.kaggle.com/harunshimanto/titanic-solution-for-beginners-guide/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Harun-Ur-Rashid
    Description

    Overview

    The data has been split into two groups:

    training set (train.csv)
    test set (test.csv)
    

    The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.

    The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.

    We also include gender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.

    Data Dictionary

    Variable Definition Key survival Survival 0 = No, 1 = Yes pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd sex Sex
    Age Age in years
    sibsp # of siblings / spouses aboard the Titanic
    parch # of parents / children aboard the Titanic
    ticket Ticket number
    fare Passenger fare
    cabin Cabin number
    embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

    Variable Notes

    pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower

    age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

    sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)

    parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.

  19. Titanic Dataset Competition

    • kaggle.com
    Updated Dec 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cynthia Barasa (2022). Titanic Dataset Competition [Dataset]. https://www.kaggle.com/datasets/cynthycynthy/titanicdataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Cynthia Barasa
    Description

    The Titanic dataset is a well-known dataset that provides information on the passengers who were onboard the fateful voyage of the RMS Titanic. The data includes details such as the passenger's name, age, gender, ticket class, fare paid, and information on their family members. The dataset also includes a column called "Survived" which indicates whether a passenger survived the disaster or not.

    There are a total of 891 rows in the dataset, with 12 columns. Some of the key columns in the dataset include:

    PassengerId: a unique identifier for each passenger • Survived: a binary variable that indicates whether the passenger survived (1) or did not survive (0) the disaster • Pclass: the ticket class of the passenger (1 = first class, 2 = second class, 3 = third class) • Name: the name of the passenger • Sex: the gender of the passenger (male or female) • Age: the age of the passenger (some values are missing) • SibSp: the number of siblings or spouses the passenger had on board • Parch: the number of parents or children the passenger had on board • Ticket: the ticket number of the passenger • Fare: the fare paid by the passenger • Cabin: the cabin number of the passenger (some values are missing) • Embarked: the port at which the passenger embarked (C = Cherbourg, Q = Queenstown, S = Southampton)

    Overall, the key challenges I encountered when working on the Titanic dataset were: how to handle missing values and imbalanced classes, encode categorical variables, reduce the dimensionality of the dataset, and identify and handle noise in the data.

    Here are a few tips and resources that I found helpful when getting started in the Titanic dataset competition: 1. Get familiar with the dataset 2. Pre-process the data 3. Split the data into training and test sets 4. Try out a few different algorithms 5. Tune the hyper parameters 6. Evaluate the model

    Here are a few resources that I found helpful as I started Working on the competition: • Kaggle's Titanic tutorial • scikit-learn documentation. • Pandas documentation

  20. Titanic survive model

    • kaggle.com
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aly El-badry (2025). Titanic survive model [Dataset]. https://www.kaggle.com/datasets/alyelbadry/titanic-survive-model/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aly El-badry
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Titanic Passenger Data

    This dataset contains information about the passengers aboard the RMS Titanic, which tragically sank during its maiden voyage in April 1912. It provides detailed data points for survival analysis and predictive modeling, including demographic details, ticket class, fare, and survival outcomes.

    Dataset Highlights:

    • Passenger Details: Information such as Name, Age, Gender, and Embarked Port.
    • Socioeconomic Status: Passenger Class (1st, 2nd, or 3rd) and Fare Price.
    • Survival Information: Whether the passenger survived or perished.
    • Family Relationships: Number of Siblings/Spouses and Parents/Children aboard.

    This dataset is ideal for exploring patterns of survival, understanding social dynamics aboard the Titanic, and testing machine learning models for classification problems.

    Suggested Use Cases:

    • Survival rate analysis by age, gender, or class.
    • Building classification models for survival prediction.
    • Statistical tests and hypothesis exploration on historical data.

    Dive in to analyze one of the most famous shipwrecks in history!

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2023). titanic [Dataset]. https://www.tensorflow.org/datasets/catalog/titanic

titanic

Explore at:
Dataset updated
Feb 12, 2023
Description

Dataset describing the survival status of individual passengers on the Titanic. Missing values in the original dataset are represented using ?. Float and int missing values are replaced with -1, string missing values are replaced with 'Unknown'.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('titanic', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

Search
Clear search
Close search
Google apps
Main menu