100+ datasets found
  1. P

    Titanic Dataset

    • paperswithcode.com
    Updated Oct 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Titanic Dataset [Dataset]. https://paperswithcode.com/dataset/titanic
    Explore at:
    Dataset updated
    Oct 27, 2024
    Description

    Titanic Dataset Description Overview The data is divided into two groups: - Training set (train.csv): Used to build machine learning models. It includes the outcome (also called the "ground truth") for each passenger, allowing models to predict survival based on “features” like gender and class. Feature engineering can also be applied to create new features. - Test set (test.csv): Used to evaluate model performance on unseen data. The ground truth is not provided; the task is to predict survival for each passenger in the test set using the trained model.

    Additionally, gender_submission.csv is provided as an example submission file, containing predictions based on the assumption that all and only female passengers survive.

    Data Dictionary | Variable | Definition | Key | |------------|------------------------------------------|-------------------------------------------------| | survival | Survival | 0 = No, 1 = Yes | | pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd | | sex | Sex | | | age | Age in years | | | sibsp | # of siblings/spouses aboard the Titanic | | | parch | # of parents/children aboard the Titanic | | | ticket | Ticket number | | | fare | Passenger fare | | | cabin | Cabin number | | | embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton |

    Variable Notes

    pclass: Proxy for socio-economic status (SES): 1st = Upper 2nd = Middle 3rd = Lower age:
    Fractional if less than 1 year.
    Estimated ages are represented in the form xx.5. sibsp: Defines family relations as: Sibling: Brother, sister, stepbrother, stepsister. Spouse: Husband, wife (excluding mistresses and fiancés). parch: Defines family relations as: Parent: Mother, father. Child: Daughter, son, stepdaughter, stepson. Some children traveled only with a nanny, so parch = 0 for them.

  2. c

    Titanic Dataset

    • cubig.ai
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Titanic Dataset [Dataset]. https://cubig.ai/store/products/393/titanic-dataset
    Explore at:
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • Based on passenger information from the Titanic, which sank in 1912, the Titanic Dataset is a representative binary classification data that includes various demographics and boarding information such as Survived, Passengers Class, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, and Embarked.

    2) Data Utilization (1) Titanic Dataset has characteristics that: • It consists of a total of 891 training samples and 12 to 15 columns (numerical and categorical mix) and also includes variables such as Age, Cabin, and Embarked with some missing values, making it suitable for preprocessing and feature engineering practice. (2) Titanic Dataset can be used to: • Development of survival prediction models: Key characteristics such as passenger rating, gender, age, and fare can be used to predict survival with different machine learning classification models such as logistic regression, random forest, and SVM. • Analysis of survival influencing factors: By analyzing the correlation between variables such as gender, age, socioeconomic status, and survival rates, you can statistically and visually explore which groups have a higher survival probability.

  3. h

    titanic-dataset

    • huggingface.co
    Updated Sep 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beijing Institute of Technology (2024). titanic-dataset [Dataset]. https://huggingface.co/datasets/BIT/titanic-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 25, 2024
    Dataset authored and provided by
    Beijing Institute of Technology
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    BIT/titanic-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. T

    titanic

    • tensorflow.org
    Updated Feb 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). titanic [Dataset]. https://www.tensorflow.org/datasets/catalog/titanic
    Explore at:
    Dataset updated
    Feb 12, 2023
    Description

    Dataset describing the survival status of individual passengers on the Titanic. Missing values in the original dataset are represented using ?. Float and int missing values are replaced with -1, string missing values are replaced with 'Unknown'.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('titanic', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  5. Titanic Dataset

    • kaggle.com
    Updated Dec 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Yasser H (2021). Titanic Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/titanic-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 24, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    M Yasser H
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://raw.githubusercontent.com/Masterx-AI/Project_Titanic_Survival_Prediction_/main/titanic.jpg" alt="">

    Description:

    The sinking of the Titanic is one of the most infamous shipwrecks in history.

    On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone on board, resulting in the death of 1502 out of 2224 passengers and crew.

    While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

    In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

    Acknowledgements:

    This dataset has been referred from Kaggle: https://www.kaggle.com/c/titanic/data.

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build a strong classification model to predict whether the passenger survives or not.
    • Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.
  6. h

    titanic

    • huggingface.co
    Updated Jun 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil Kumar (2024). titanic [Dataset]. https://huggingface.co/datasets/nik20004/titanic
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 1, 2024
    Authors
    Nikhil Kumar
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    nik20004/titanic dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. Titanic Dataset Competition

    • kaggle.com
    Updated Dec 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cynthia Barasa (2022). Titanic Dataset Competition [Dataset]. https://www.kaggle.com/datasets/cynthycynthy/titanicdataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Cynthia Barasa
    Description

    The Titanic dataset is a well-known dataset that provides information on the passengers who were onboard the fateful voyage of the RMS Titanic. The data includes details such as the passenger's name, age, gender, ticket class, fare paid, and information on their family members. The dataset also includes a column called "Survived" which indicates whether a passenger survived the disaster or not.

    There are a total of 891 rows in the dataset, with 12 columns. Some of the key columns in the dataset include:

    PassengerId: a unique identifier for each passenger • Survived: a binary variable that indicates whether the passenger survived (1) or did not survive (0) the disaster • Pclass: the ticket class of the passenger (1 = first class, 2 = second class, 3 = third class) • Name: the name of the passenger • Sex: the gender of the passenger (male or female) • Age: the age of the passenger (some values are missing) • SibSp: the number of siblings or spouses the passenger had on board • Parch: the number of parents or children the passenger had on board • Ticket: the ticket number of the passenger • Fare: the fare paid by the passenger • Cabin: the cabin number of the passenger (some values are missing) • Embarked: the port at which the passenger embarked (C = Cherbourg, Q = Queenstown, S = Southampton)

    Overall, the key challenges I encountered when working on the Titanic dataset were: how to handle missing values and imbalanced classes, encode categorical variables, reduce the dimensionality of the dataset, and identify and handle noise in the data.

    Here are a few tips and resources that I found helpful when getting started in the Titanic dataset competition: 1. Get familiar with the dataset 2. Pre-process the data 3. Split the data into training and test sets 4. Try out a few different algorithms 5. Tune the hyper parameters 6. Evaluate the model

    Here are a few resources that I found helpful as I started Working on the competition: • Kaggle's Titanic tutorial • scikit-learn documentation. • Pandas documentation

  8. A

    ‘Titanic Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-dataset-bec7/bfa18318/?iid=006-936&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Titanic Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yasserh/titanic-dataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    https://raw.githubusercontent.com/Masterx-AI/Project_Titanic_Survival_Prediction_/main/titanic.jpg" alt="">

    Description:

    The sinking of the Titanic is one of the most infamous shipwrecks in history.

    On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone on board, resulting in the death of 1502 out of 2224 passengers and crew.

    While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

    In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

    Acknowledgements:

    This dataset has been referred from Kaggle: https://www.kaggle.com/c/titanic/data.

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build a strong classification model to predict whether the passenger survives or not.
    • Also fine-tune the hyperparameters & compare the evaluation metrics of various classification algorithms.

    --- Original source retains full ownership of the source dataset ---

  9. A

    ‘Titanic dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-dataset-7d6b/0f1d826e/?iid=009-782&v=presentation
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Titanic dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ibrahimelsayed182/titanic-dataset on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Overview

    This is Titanic dataset

    Data Dictionary

    AttributesDefinitionKey
    sexSex/Gendermale/female
    ageAge
    sibspsiblings of the passenger0/1 /2 ...
    parchparents / children aboard the Titanic0/1/2 ...
    farePassenger fare
    embarkedPort of EmbarkationC : Cherbourg, Q : Queenstown, S : Southampton
    classTicket classFirst / Second / Third
    whocategories to passengersmale, female, child
    alonehe was alone in ship or no0/1
    survived0/1

    --- Original source retains full ownership of the source dataset ---

  10. A

    ‘Titanic-Dataset (train.csv)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic-Dataset (train.csv)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-dataset-train-csv-1d8d/f4271729/?iid=006-918&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Titanic-Dataset (train.csv)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hesh97/titanicdataset-traincsv on 28 January 2022.

    --- No further description of dataset provided by original source ---

    --- Original source retains full ownership of the source dataset ---

  11. A

    ‘Titanic Dataset Analysis’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic Dataset Analysis’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-dataset-analysis-c0ba/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Titanic Dataset Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/cities/titanic123 on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    There's a story behind every dataset and here's your opportunity to share yours.

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

    --- Original source retains full ownership of the source dataset ---

  12. h

    titanic

    • huggingface.co
    Updated Dec 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    K Kovacs (2024). titanic [Dataset]. https://huggingface.co/datasets/kkovacs/titanic
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 11, 2024
    Authors
    K Kovacs
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    kkovacs/titanic dataset hosted on Hugging Face and contributed by the HF Datasets community

  13. Titanic classification

    • figshare.com
    txt
    Updated Sep 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alvaro Rioboo (2020). Titanic classification [Dataset]. http://doi.org/10.6084/m9.figshare.12979220.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 19, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Alvaro Rioboo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Titanic dataset for classification training.

  14. P

    TITANIC-FGS Dataset

    • paperswithcode.com
    Updated Aug 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingning Guo; Mengwei Wu; Yuxiang Shen; Haifeng Li; Chao Tao (2024). TITANIC-FGS Dataset [Dataset]. https://paperswithcode.com/dataset/titanic-fgs
    Explore at:
    Dataset updated
    Aug 12, 2024
    Authors
    Mingning Guo; Mengwei Wu; Yuxiang Shen; Haifeng Li; Chao Tao
    Description

    TITANIC-FGS is the first domain knowledge-enhanced, instruction-following dataset specifically designed for the Remote Sensing Fine-Grained Ship Classification (RS-FGSC) task. It simulates human-like step-by-step decision-making to train vision-language models (VLMs) for interpretable and accurate ship classification.

  15. Titanic Dataset

    • kaggle.com
    Updated Sep 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VVignesh Kumar (2021). Titanic Dataset [Dataset]. https://www.kaggle.com/datasets/vvkumar/titanic-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 6, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    VVignesh Kumar
    Description

    Dataset

    This dataset was created by VVignesh Kumar

    Contents

  16. RMS Titanic Passengers and Crew Complete List

    • encyclopedia-titanica.org
    json
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RMS Titanic Passengers and Crew Complete List [Dataset]. https://www.encyclopedia-titanica.org/titanic-passengers-and-crew/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jan 27, 2025
    Dataset authored and provided by
    Encyclopedia Titanicahttp://www.encyclopedia-titanica.org/
    License

    https://www.encyclopedia-titanica.org/copyright-and-permissions.htmlhttps://www.encyclopedia-titanica.org/copyright-and-permissions.html

    Description

    Who travelled on the Titanic? When she reached the open Atlantic on 11 April 1912, the Titanic carried 2,208 people however many more travelled on her: on the delivery trip from Belfast to Southampton, and on the short journeys to Cherbourg and Queenstown. This dataset includes everyone that travelled on the maiden voyage but also the delivery and passengers who were fortunate enough to disembark.

  17. titanic

    • zenodo.org
    • huggingface.co
    csv
    Updated Feb 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miguel Arbea Gomez; Miguel Arbea Gomez (2022). titanic [Dataset]. http://doi.org/10.5281/zenodo.5987761
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 7, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Miguel Arbea Gomez; Miguel Arbea Gomez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    titanic dataset

  18. titanic-dataset

    • kaggle.com
    zip
    Updated Aug 27, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vijesh (2019). titanic-dataset [Dataset]. https://www.kaggle.com/vijesh08/titanicdataset
    Explore at:
    zip(30855 bytes)Available download formats
    Dataset updated
    Aug 27, 2019
    Authors
    Vijesh
    Description

    Dataset

    This dataset was created by Vijesh

    Contents

  19. A

    ‘Titanic survival’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic survival’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-survival-936d/c8663674/?iid=014-063&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Titanic survival’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/dushyantkhinchi/titanic-survival on 28 January 2022.

    --- No further description of dataset provided by original source ---

    --- Original source retains full ownership of the source dataset ---

  20. A

    ‘Titanic csv’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic csv’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-csv-b28e/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Titanic csv’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/fossouodonald/titaniccsv on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    this dataset is the result of titanic csv

    --- Original source retains full ownership of the source dataset ---

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). Titanic Dataset [Dataset]. https://paperswithcode.com/dataset/titanic

Titanic Dataset

Titanic - Machine Learning from Disaster

Explore at:
Dataset updated
Oct 27, 2024
Description

Titanic Dataset Description Overview The data is divided into two groups: - Training set (train.csv): Used to build machine learning models. It includes the outcome (also called the "ground truth") for each passenger, allowing models to predict survival based on “features” like gender and class. Feature engineering can also be applied to create new features. - Test set (test.csv): Used to evaluate model performance on unseen data. The ground truth is not provided; the task is to predict survival for each passenger in the test set using the trained model.

Additionally, gender_submission.csv is provided as an example submission file, containing predictions based on the assumption that all and only female passengers survive.

Data Dictionary | Variable | Definition | Key | |------------|------------------------------------------|-------------------------------------------------| | survival | Survival | 0 = No, 1 = Yes | | pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd | | sex | Sex | | | age | Age in years | | | sibsp | # of siblings/spouses aboard the Titanic | | | parch | # of parents/children aboard the Titanic | | | ticket | Ticket number | | | fare | Passenger fare | | | cabin | Cabin number | | | embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton |

Variable Notes

pclass: Proxy for socio-economic status (SES): 1st = Upper 2nd = Middle 3rd = Lower age:
Fractional if less than 1 year.
Estimated ages are represented in the form xx.5. sibsp: Defines family relations as: Sibling: Brother, sister, stepbrother, stepsister. Spouse: Husband, wife (excluding mistresses and fiancés). parch: Defines family relations as: Parent: Mother, father. Child: Daughter, son, stepdaughter, stepson. Some children traveled only with a nanny, so parch = 0 for them.

Search
Clear search
Close search
Google apps
Main menu