"Titanic - Machine Learning from Disaster"
The sinking of the Titanic on April 15, 1912, remains one of the most infamous shipwrecks in history. This dataset provides comprehensive information about the passengers onboard, including demographic, socio-economic, and other relevant details.
The goal of this dataset is to apply machine learning techniques to predict which passengers survived the disaster. By analyzing features such as age, gender, class, and ticket information, data scientists can explore the factors that influenced survival rates.
Key Features:
PassengerID: A unique identifier for each passenger. Pclass: Ticket class (1st = Upper, 2nd = Middle, 3rd = Lower). Name: Name of the passenger. Sex: Gender of the passenger. Age: Age of the passenger. SibSp: Number of siblings or spouses aboard. Parch: Number of parents or children aboard. Ticket: Ticket number. Fare: Passenger fare. Cabin: Cabin number (if available). Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton). This dataset offers an excellent opportunity to practice classification algorithms and gain insights into predictive modeling, data cleaning, and feature engineering.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • Based on passenger information from the Titanic, which sank in 1912, the Titanic Dataset is a representative binary classification data that includes various demographics and boarding information such as Survived, Passengers Class, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, and Embarked.
2) Data Utilization (1) Titanic Dataset has characteristics that: • It consists of a total of 891 training samples and 12 to 15 columns (numerical and categorical mix) and also includes variables such as Age, Cabin, and Embarked with some missing values, making it suitable for preprocessing and feature engineering practice. (2) Titanic Dataset can be used to: • Development of survival prediction models: Key characteristics such as passenger rating, gender, age, and fare can be used to predict survival with different machine learning classification models such as logistic regression, random forest, and SVM. • Analysis of survival influencing factors: By analyzing the correlation between variables such as gender, age, socioeconomic status, and survival rates, you can statistically and visually explore which groups have a higher survival probability.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BrianSuToronto/titanic-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset describing the survival status of individual passengers on the Titanic. Missing values in the original dataset are represented using ?. Float and int missing values are replaced with -1, string missing values are replaced with 'Unknown'.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('titanic', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
This dataset was created by deijia
This dataset was created by Nine Four Six
https://choosealicense.com/licenses/afl-3.0/https://choosealicense.com/licenses/afl-3.0/
victor/titanic dataset hosted on Hugging Face and contributed by the HF Datasets community
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Titanic Dataset (for Machine Learning)The Titanic dataset is a classic and widely used dataset for machine learning and data analysis. It contains information about the passengers of the RMS Titanic, which tragically sank on its maiden voyage on April 15, 1912. The dataset provides details about each passenger, including their demographics, ticket information, and survival status. This dataset is often used to demonstrate and practice various machine learning techniques, particularly classification.This dataset is divided into two: training set & testing set.Dataset Variables:PassengerId: count for each passengerSurvived: 0 = No; 1 = YesName: name of passengerSex: passenger's sexAge: passenger's ageSibSp: number of siblings/spouses abroad the TitanicParch: number of parents/children abroad the TitanicTicket: ticket numberFare: passenger fareCabin: cabin numberEmbarked: port where passenger embarked (C = Cherbourg; Q = Queenstown; S = Southampton)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Titanic-Dataset (train.csv)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hesh97/titanicdataset-traincsv on 12 November 2021.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
Titanic dataset
https://www.encyclopedia-titanica.org/copyright-and-permissions.htmlhttps://www.encyclopedia-titanica.org/copyright-and-permissions.html
Who travelled on the Titanic? When she reached the open Atlantic on 11 April 1912, the Titanic carried 2,208 people however many more travelled on her: on the delivery trip from Belfast to Southampton, and on the short journeys to Cherbourg and Queenstown. This dataset includes everyone that travelled on the maiden voyage but also the delivery and passengers who were fortunate enough to disembark.
This dataset was created by Bhagesh Kumar
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
kkovacs/titanic dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
titanic dataset
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Version of the titanic dataset used in ggEDA manuscript.Can be loaded from the datarium R package (datarium::titanic.raw
). Originally published by the British Board of Trade in 1990. If you use, please cite:British Board of Trade. Report on the Loss of the ’Titanic’ (S.S.). Allan Sutton Publishing, Gloucester, UK, 1990. British Board of Trade Inquiry Report (reprint).Alboukadel Kassambara. datarium: Data Bank for Statistical Analysis and Visualization, 2019. URL https://CRAN.R-project.org/package=datarium. R package version 0.1.0.
https://www.encyclopedia-titanica.org/copyright-and-permissions.htmlhttps://www.encyclopedia-titanica.org/copyright-and-permissions.html
The complete list of RMS Titanic passengers and crew, including detailed records of survivors and victims.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Titanic.csv’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/garrettrlynch/titaniccsv on 30 September 2021.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Titanic dataset for classification training.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Titanic Solution for Beginner's Guide’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harunshimanto/titanic-solution-for-beginners-guide on 30 September 2021.
--- Dataset description provided by original source is as follows ---
The data has been split into two groups:
training set (train.csv)
test set (test.csv)
The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.
The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.
We also include gender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.
Variable Definition Key
survival Survival 0 = No, 1 = Yes
pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd
sex Sex
Age Age in years
sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the Titanic
ticket Ticket number
fare Passenger fare
cabin Cabin number
embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton
pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower
age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)
parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.
--- Original source retains full ownership of the source dataset ---
A partial passenger manifest for the fateful last trip of the Titanic
"Titanic - Machine Learning from Disaster"
The sinking of the Titanic on April 15, 1912, remains one of the most infamous shipwrecks in history. This dataset provides comprehensive information about the passengers onboard, including demographic, socio-economic, and other relevant details.
The goal of this dataset is to apply machine learning techniques to predict which passengers survived the disaster. By analyzing features such as age, gender, class, and ticket information, data scientists can explore the factors that influenced survival rates.
Key Features:
PassengerID: A unique identifier for each passenger. Pclass: Ticket class (1st = Upper, 2nd = Middle, 3rd = Lower). Name: Name of the passenger. Sex: Gender of the passenger. Age: Age of the passenger. SibSp: Number of siblings or spouses aboard. Parch: Number of parents or children aboard. Ticket: Ticket number. Fare: Passenger fare. Cabin: Cabin number (if available). Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton). This dataset offers an excellent opportunity to practice classification algorithms and gain insights into predictive modeling, data cleaning, and feature engineering.