25 datasets found
  1. Titanic dataset

    • kaggle.com
    Updated Feb 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sidra Kousar (2024). Titanic dataset [Dataset]. https://www.kaggle.com/datasets/sidrakousar/titanic-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 29, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sidra Kousar
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    The Titanic dataset is a popular dataset used for data analysis and machine learning tasks. It contains various information about passengers aboard the Titanic, including whether they survived or not. Here's a brief description of each of the columns:

    PassengerId: A unique identifier for each passenger. Survived: Indicates whether the passenger survived or not. (0 = No, 1 = Yes) Pclass: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd) Name: Name of the passenger. Sex: Gender of the passenger. Age: Age of the passenger in years. (Fractional if less than 1) SibSp: Number of siblings or spouses aboard the Titanic. Parch: Number of parents or children aboard the Titanic. Ticket: Ticket number. Fare: Fare paid for the ticket. Cabin: Cabin number. Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton) This dataset is often used for tasks such as predicting survival based on various factors or analyzing demographics of passengers aboard the Titanic.

  2. f

    Titanic

    • rochester.figshare.com
    application/csv
    Updated Aug 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aabha Pandit; Alois Romanowski; Heather Owen (2024). Titanic [Dataset]. http://doi.org/10.60593/ur.d.26462215.v1
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Aug 12, 2024
    Dataset provided by
    University of Rochester
    Authors
    Aabha Pandit; Alois Romanowski; Heather Owen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Titanic Dataset (for Machine Learning)The Titanic dataset is a classic and widely used dataset for machine learning and data analysis. It contains information about the passengers of the RMS Titanic, which tragically sank on its maiden voyage on April 15, 1912. The dataset provides details about each passenger, including their demographics, ticket information, and survival status. This dataset is often used to demonstrate and practice various machine learning techniques, particularly classification.This dataset is divided into two: training set & testing set.Dataset Variables:PassengerId: count for each passengerSurvived: 0 = No; 1 = YesName: name of passengerSex: passenger's sexAge: passenger's ageSibSp: number of siblings/spouses abroad the TitanicParch: number of parents/children abroad the TitanicTicket: ticket numberFare: passenger fareCabin: cabin numberEmbarked: port where passenger embarked (C = Cherbourg; Q = Queenstown; S = Southampton)

  3. Titanic- Machine Learning from Disaster

    • kaggle.com
    Updated Jan 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ManishaPrajapati (2025). Titanic- Machine Learning from Disaster [Dataset]. https://www.kaggle.com/datasets/nitu1234444/titanic-machine-learning-from-disaster/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 7, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ManishaPrajapati
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by ManishaPrajapati

    Released under MIT

    Contents

  4. c

    Titanic Dataset

    • cubig.ai
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Titanic Dataset [Dataset]. https://cubig.ai/store/products/393/titanic-dataset
    Explore at:
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • Based on passenger information from the Titanic, which sank in 1912, the Titanic Dataset is a representative binary classification data that includes various demographics and boarding information such as Survived, Passengers Class, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, and Embarked.

    2) Data Utilization (1) Titanic Dataset has characteristics that: • It consists of a total of 891 training samples and 12 to 15 columns (numerical and categorical mix) and also includes variables such as Age, Cabin, and Embarked with some missing values, making it suitable for preprocessing and feature engineering practice. (2) Titanic Dataset can be used to: • Development of survival prediction models: Key characteristics such as passenger rating, gender, age, and fare can be used to predict survival with different machine learning classification models such as logistic regression, random forest, and SVM. • Analysis of survival influencing factors: By analyzing the correlation between variables such as gender, age, socioeconomic status, and survival rates, you can statistically and visually explore which groups have a higher survival probability.

  5. A

    ‘Titanic: Machine Learning from Disaster’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic: Machine Learning from Disaster’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-machine-learning-from-disaster-235d/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Titanic: Machine Learning from Disaster’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shuofxz/titanic-machine-learning-from-disaster on 28 January 2022.

    --- No further description of dataset provided by original source ---

    --- Original source retains full ownership of the source dataset ---

  6. Titanic classification

    • figshare.com
    txt
    Updated Sep 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alvaro Rioboo (2020). Titanic classification [Dataset]. http://doi.org/10.6084/m9.figshare.12979220.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 19, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Alvaro Rioboo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Titanic dataset for classification training.

  7. T

    titanic

    • tensorflow.org
    Updated Feb 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). titanic [Dataset]. https://www.tensorflow.org/datasets/catalog/titanic
    Explore at:
    Dataset updated
    Feb 12, 2023
    Description

    Dataset describing the survival status of individual passengers on the Titanic. Missing values in the original dataset are represented using ?. Float and int missing values are replaced with -1, string missing values are replaced with 'Unknown'.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('titanic', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  8. A

    ‘Titanic Solution for Beginner's Guide’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Titanic Solution for Beginner's Guide’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-solution-for-beginner-s-guide-03a8/ae3641d4/?iid=014-163&v=presentation
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Titanic Solution for Beginner's Guide’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harunshimanto/titanic-solution-for-beginners-guide on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Overview

    The data has been split into two groups:

    training set (train.csv)
    test set (test.csv)
    

    The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.

    The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.

    We also include gender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.

    Data Dictionary

    Variable Definition Key survival Survival 0 = No, 1 = Yes pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd sex Sex
    Age Age in years
    sibsp # of siblings / spouses aboard the Titanic
    parch # of parents / children aboard the Titanic
    ticket Ticket number
    fare Passenger fare
    cabin Cabin number
    embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

    Variable Notes

    pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower

    age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

    sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)

    parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.

    --- Original source retains full ownership of the source dataset ---

  9. A

    ‘Titanic: cleaned data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Titanic: cleaned data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-titanic-cleaned-data-cbf4/dc9cd7ff/?iid=055-046&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Titanic: cleaned data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jamesleslie/titanic-cleaned-data on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    Introduction

    This dataset was created in this notebook as part of a three-part series. The data is in machine-learning-ready format, with all missing values for the Age, Fare and Embarked columns having been imputed.

    Data imputation

    • Age: this column was imputed by using the median age for the passenger's title (Mr, Mrs, Dr etc).
    • Fare: the single missing value in this column was imputed using the median value for that passenger's class.
    • Embarked: the two missing values here were imputed using the Pandas backfill method.

    Usage

    This data is used in both the second and third parts of the series.

    --- Original source retains full ownership of the source dataset ---

  10. Titanic_ML_Python

    • kaggle.com
    Updated Dec 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Hernandez Mayen (2023). Titanic_ML_Python [Dataset]. https://www.kaggle.com/datasets/jonathanhernandez1/titanic-ml-python
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 17, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jonathan Hernandez Mayen
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    Explora nuestro proyecto de aprendizaje automático para predecir la supervivencia en el Titanic. Con un puntaje perfecto de 1.0 y una matriz de confusión impecable, revelamos patrones asombrosos en los datos históricos.

  11. Titanic Leaderboard March 2023

    • kaggle.com
    Updated Apr 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas Antoine (2023). Titanic Leaderboard March 2023 [Dataset]. http://doi.org/10.34740/kaggle/dsv/5281032
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2023
    Dataset provided by
    Kaggle
    Authors
    Lucas Antoine
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset used in my 🛳️ Titanic - Top 1% with KNN [0.81818] notebook. It contains all the leaderboard's entries from the Titanic - Machine Learning from Disaster competition in March 2023.

  12. Competition_Titanic_machine learning from disaster

    • kaggle.com
    Updated Jan 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mukti shukla (2023). Competition_Titanic_machine learning from disaster [Dataset]. https://www.kaggle.com/datasets/muktishukla/titanic-servival
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 20, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    mukti shukla
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Dataset

    This dataset was created by mukti shukla

    Released under GNU Lesser General Public License 3.0

    Contents

  13. Preprocessed Titanic Survived Prediction Data

    • kaggle.com
    Updated Feb 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fethiye (2021). Preprocessed Titanic Survived Prediction Data [Dataset]. https://www.kaggle.com/fethiye/titanic-preprocessed-train-data/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 6, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Fethiye
    Description

    Context

    Data set was created by preprocessing (filling lost data, extracting new features) of Titanic - Machine Learning Disaster data set.

    Using this processed data set, the machine learning models can be applied directly.

    You can see preprocessing step in notebook: https://www.kaggle.com/fethiye/titanic-predict-survival-prediction

  14. d

    Oceanographic data collected during the Titanic Expedition 2004...

    • catalog.data.gov
    • data.amerigeoss.org
    • +1more
    Updated Jul 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact) (2025). Oceanographic data collected during the Titanic Expedition 2004 (titanic2004) on NOAA Ship Ronald H. Brown in North Atlantic Ocean from 2004-05-27 to 2004-06-12 (NCEI Accession 0072311) [Dataset]. https://catalog.data.gov/dataset/oceanographic-data-collected-during-the-titanic-expedition-2004-titanic2004-on-noaa-ship-ronald2
    Explore at:
    Dataset updated
    Jul 3, 2025
    Dataset provided by
    (Point of Contact)
    Area covered
    Atlantic Ocean
    Description

    Nearly 20 years after first finding the sunken remains of the RMS Titanic, marine explorer Robert Ballard returned in June 2004 to help the National Oceanic and Atmospheric Administration (NOAA) study the ship's rapid deterioration. A professor of oceanography at the University of Rhode Island (URI) and director of its Institute for Archaeological Oceanography, Dr. Ballard and his team of scientists from NOAA and other institutions spent 11 days at the site, mapping the ship and conducting scientific analyses of its deterioration. The team worked aboard NOAA Ship Ronald H. Brown from May 30 through June 9, and used remotely operated vehicles (ROVs) to conduct a sophisticated documentation of the state of Titanic that was not possible in the 1980s. This "Look, don't touch" mission utilized high-definition video and stereoscopic still images to provide an updated assessment of the wreck site. The science team included Dr. Dwight Coleman of URI and the Mystic Aquarium & Institute for Exploration (MAIFE), who was the expedition's research chief. As the marine archaeologist with NOAA's Office of Ocean Exploration, I oversaw the expedition's marine archaeology component. In addition to mapping the Titanic, expedition goals included the microbial research of scientist Roy Cullimore, who studied the natural deterioration of the ship's hull. Tiny microbes that feed on iron and create icicle-shaped formations called rusticles are responsible for this deterioration. While rusticles have been observed for many years, little is known about them. As the nation's ocean agency, NOAA has a vested interest in the scientific and cultural aspects of the Titanic, and in its appropriate treatment and preservation. NOAA's focus is to build a baseline of scientific information from which we can measure the shipwreck's processes and deterioration, and then apply the knowledge we gain to other deep-water shipwrecks and submerged cultural resources. The Guidelines for Research, Exploration and Salvage of RMS Titanic (9 pages, 104k) were issued under the authority of the RMS Titanic Maritime Act of 1986. On Monday, June 7, 2004, at 9 p.m. ET/PT, the National Geographic Channel gave audiences unprecedented access to the ongoing expedition by broadcasting a one-hour special, "Return to Titanic External Link," which originated from NOAA Ship Ronald H. Brown and included a live underwater telecast from the Titanic. Simultaneous with the expedition, MAIFE enabled thousands of children to experience the Titanic mission as it occurred. From June 4 through 9, four shows a day were transmitted live from the expedition via satellite and Internet2 to participating sites. The JASON Foundation for Education has created a new middle-school math curriculum called "JASON Math Adventure: Geometry and Return to Titanic," which follows the work of researchers on the expedition. Students will learn how geometry concepts are used to position NOAA Ship Ronald H. Brown at the Titanic wreck and the ROV Hercules on the Titanic's bow. Technology partners on the expedition included EDS of Texas, which wired the mission, and VBrick Systems of Connecticut, which enabled the mission feed to be broadcast nationwide.

  15. Children on the Titanic

    • encyclopedia-titanica.org
    json
    Updated Jan 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Encyclopedia Titanica (2025). Children on the Titanic [Dataset]. https://www.encyclopedia-titanica.org/children-on-titanic/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jan 28, 2025
    Dataset authored and provided by
    Encyclopedia Titanicahttp://www.encyclopedia-titanica.org/
    License

    https://www.encyclopedia-titanica.org/copyright-and-permissions.htmlhttps://www.encyclopedia-titanica.org/copyright-and-permissions.html

    Description

    A comprehensive dataset containing detailed profiles of all children (14 and under) who were aboard the Titanic. This includes information on their names, ages, family relationships, cabin assignments, nationalities, and survival status. The dataset provides insights into the demographics and experiences of the youngest passengers on the Titanic.

  16. Data from: Titanic Survival Prediction

    • kaggle.com
    Updated Jan 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shabu KC (2018). Titanic Survival Prediction [Dataset]. https://www.kaggle.com/shabukc/titanic-survival-prediction
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 1, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shabu KC
    Description

    Context

    This is an attempt to learn prediction from the given training and test sets

    Content

    This is the sample from Kaggle learners section

    Acknowledgements

    Thankx Kaggle for this sample data set and allowing us to use it for learning

    Inspiration

    Learn to use data to solve problems and provide solutions.

  17. Titanic Dataset - cleaned

    • kaggle.com
    Updated Aug 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WinstonSDodson (2019). Titanic Dataset - cleaned [Dataset]. https://www.kaggle.com/datasets/winstonsdodson/titanic-dataset-cleaned/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    WinstonSDodson
    Description

    This is the classic Titanic Dataset provided in the Kaggle Competition K Kernel and then cleaned in one of the most popular Kernels there. Please see the Kernel titled, "A Data Science Framework: To Achieve 99% Accuracy" for a great lesson in data science. This Kernel gives a great explanaton of the thinking behind the of this data cleaning as well as a very professional demonstration of the technologies and skills to do so. It then continues to provide an overview of many ML techniques and it is copiously and meticulously documented with many useful citations.

    Of course, data cleaning is an essential skill in data science but I wanted to use this data for a study of other machine learning techniques. So, I found and used this set of data that is well known and cleaned to a benchmark accepted by many.

  18. Titanic Dataset Competition

    • kaggle.com
    Updated Dec 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cynthia Barasa (2022). Titanic Dataset Competition [Dataset]. https://www.kaggle.com/datasets/cynthycynthy/titanicdataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Cynthia Barasa
    Description

    The Titanic dataset is a well-known dataset that provides information on the passengers who were onboard the fateful voyage of the RMS Titanic. The data includes details such as the passenger's name, age, gender, ticket class, fare paid, and information on their family members. The dataset also includes a column called "Survived" which indicates whether a passenger survived the disaster or not.

    There are a total of 891 rows in the dataset, with 12 columns. Some of the key columns in the dataset include:

    PassengerId: a unique identifier for each passenger • Survived: a binary variable that indicates whether the passenger survived (1) or did not survive (0) the disaster • Pclass: the ticket class of the passenger (1 = first class, 2 = second class, 3 = third class) • Name: the name of the passenger • Sex: the gender of the passenger (male or female) • Age: the age of the passenger (some values are missing) • SibSp: the number of siblings or spouses the passenger had on board • Parch: the number of parents or children the passenger had on board • Ticket: the ticket number of the passenger • Fare: the fare paid by the passenger • Cabin: the cabin number of the passenger (some values are missing) • Embarked: the port at which the passenger embarked (C = Cherbourg, Q = Queenstown, S = Southampton)

    Overall, the key challenges I encountered when working on the Titanic dataset were: how to handle missing values and imbalanced classes, encode categorical variables, reduce the dimensionality of the dataset, and identify and handle noise in the data.

    Here are a few tips and resources that I found helpful when getting started in the Titanic dataset competition: 1. Get familiar with the dataset 2. Pre-process the data 3. Split the data into training and test sets 4. Try out a few different algorithms 5. Tune the hyper parameters 6. Evaluate the model

    Here are a few resources that I found helpful as I started Working on the competition: • Kaggle's Titanic tutorial • scikit-learn documentation. • Pandas documentation

  19. Titanic Solution for Beginner's Guide

    • kaggle.com
    Updated Mar 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harun-Ur-Rashid (2018). Titanic Solution for Beginner's Guide [Dataset]. https://www.kaggle.com/harunshimanto/titanic-solution-for-beginners-guide/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 12, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Harun-Ur-Rashid
    Description

    Overview

    The data has been split into two groups:

    training set (train.csv)
    test set (test.csv)
    

    The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.

    The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.

    We also include gender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.

    Data Dictionary

    Variable Definition Key survival Survival 0 = No, 1 = Yes pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd sex Sex
    Age Age in years
    sibsp # of siblings / spouses aboard the Titanic
    parch # of parents / children aboard the Titanic
    ticket Ticket number
    fare Passenger fare
    cabin Cabin number
    embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

    Variable Notes

    pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower

    age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

    sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)

    parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.

  20. Titanic Data Simple EDA with Logistic Regression

    • kaggle.com
    Updated Aug 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vicky Nayak (2020). Titanic Data Simple EDA with Logistic Regression [Dataset]. https://www.kaggle.com/vickynayak9/titanic-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 12, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vicky Nayak
    Description

    Dataset

    This dataset was created by Vicky Nayak

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sidra Kousar (2024). Titanic dataset [Dataset]. https://www.kaggle.com/datasets/sidrakousar/titanic-dataset/code
Organization logo

Titanic dataset

"Survival Prediction on the Titanic: A Machine Learning Approach"

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 29, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sidra Kousar
License

http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

Description

The Titanic dataset is a popular dataset used for data analysis and machine learning tasks. It contains various information about passengers aboard the Titanic, including whether they survived or not. Here's a brief description of each of the columns:

PassengerId: A unique identifier for each passenger. Survived: Indicates whether the passenger survived or not. (0 = No, 1 = Yes) Pclass: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd) Name: Name of the passenger. Sex: Gender of the passenger. Age: Age of the passenger in years. (Fractional if less than 1) SibSp: Number of siblings or spouses aboard the Titanic. Parch: Number of parents or children aboard the Titanic. Ticket: Ticket number. Fare: Fare paid for the ticket. Cabin: Cabin number. Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton) This dataset is often used for tasks such as predicting survival based on various factors or analyzing demographics of passengers aboard the Titanic.

Search
Clear search
Close search
Google apps
Main menu