http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
The Titanic dataset is a popular dataset used for data analysis and machine learning tasks. It contains various information about passengers aboard the Titanic, including whether they survived or not. Here's a brief description of each of the columns:
PassengerId: A unique identifier for each passenger. Survived: Indicates whether the passenger survived or not. (0 = No, 1 = Yes) Pclass: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd) Name: Name of the passenger. Sex: Gender of the passenger. Age: Age of the passenger in years. (Fractional if less than 1) SibSp: Number of siblings or spouses aboard the Titanic. Parch: Number of parents or children aboard the Titanic. Ticket: Ticket number. Fare: Fare paid for the ticket. Cabin: Cabin number. Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton) This dataset is often used for tasks such as predicting survival based on various factors or analyzing demographics of passengers aboard the Titanic.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Titanic Dataset (for Machine Learning)The Titanic dataset is a classic and widely used dataset for machine learning and data analysis. It contains information about the passengers of the RMS Titanic, which tragically sank on its maiden voyage on April 15, 1912. The dataset provides details about each passenger, including their demographics, ticket information, and survival status. This dataset is often used to demonstrate and practice various machine learning techniques, particularly classification.This dataset is divided into two: training set & testing set.Dataset Variables:PassengerId: count for each passengerSurvived: 0 = No; 1 = YesName: name of passengerSex: passenger's sexAge: passenger's ageSibSp: number of siblings/spouses abroad the TitanicParch: number of parents/children abroad the TitanicTicket: ticket numberFare: passenger fareCabin: cabin numberEmbarked: port where passenger embarked (C = Cherbourg; Q = Queenstown; S = Southampton)
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by ManishaPrajapati
Released under MIT
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • Based on passenger information from the Titanic, which sank in 1912, the Titanic Dataset is a representative binary classification data that includes various demographics and boarding information such as Survived, Passengers Class, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, and Embarked.
2) Data Utilization (1) Titanic Dataset has characteristics that: • It consists of a total of 891 training samples and 12 to 15 columns (numerical and categorical mix) and also includes variables such as Age, Cabin, and Embarked with some missing values, making it suitable for preprocessing and feature engineering practice. (2) Titanic Dataset can be used to: • Development of survival prediction models: Key characteristics such as passenger rating, gender, age, and fare can be used to predict survival with different machine learning classification models such as logistic regression, random forest, and SVM. • Analysis of survival influencing factors: By analyzing the correlation between variables such as gender, age, socioeconomic status, and survival rates, you can statistically and visually explore which groups have a higher survival probability.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Titanic: Machine Learning from Disaster’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shuofxz/titanic-machine-learning-from-disaster on 28 January 2022.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Titanic dataset for classification training.
Dataset describing the survival status of individual passengers on the Titanic. Missing values in the original dataset are represented using ?. Float and int missing values are replaced with -1, string missing values are replaced with 'Unknown'.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('titanic', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Titanic Solution for Beginner's Guide’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/harunshimanto/titanic-solution-for-beginners-guide on 14 February 2022.
--- Dataset description provided by original source is as follows ---
The data has been split into two groups:
training set (train.csv)
test set (test.csv)
The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.
The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.
We also include gender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.
Variable Definition Key
survival Survival 0 = No, 1 = Yes
pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd
sex Sex
Age Age in years
sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the Titanic
ticket Ticket number
fare Passenger fare
cabin Cabin number
embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton
pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower
age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)
parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Titanic: cleaned data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jamesleslie/titanic-cleaned-data on 30 September 2021.
--- Dataset description provided by original source is as follows ---
This dataset was created in this notebook as part of a three-part series. The data is in machine-learning-ready format, with all missing values for the Age
, Fare
and Embarked
columns having been imputed.
Age
: this column was imputed by using the median age for the passenger's title (Mr, Mrs, Dr etc).Fare
: the single missing value in this column was imputed using the median value for that passenger's class.Embarked
: the two missing values here were imputed using the Pandas backfill
method.This data is used in both the second and third parts of the series.
--- Original source retains full ownership of the source dataset ---
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Explora nuestro proyecto de aprendizaje automático para predecir la supervivencia en el Titanic. Con un puntaje perfecto de 1.0 y una matriz de confusión impecable, revelamos patrones asombrosos en los datos históricos.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset used in my 🛳️ Titanic - Top 1% with KNN [0.81818] notebook. It contains all the leaderboard's entries from the Titanic - Machine Learning from Disaster competition in March 2023.
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
This dataset was created by mukti shukla
Released under GNU Lesser General Public License 3.0
Data set was created by preprocessing (filling lost data, extracting new features) of Titanic - Machine Learning Disaster data set.
Using this processed data set, the machine learning models can be applied directly.
You can see preprocessing step in notebook: https://www.kaggle.com/fethiye/titanic-predict-survival-prediction
Nearly 20 years after first finding the sunken remains of the RMS Titanic, marine explorer Robert Ballard returned in June 2004 to help the National Oceanic and Atmospheric Administration (NOAA) study the ship's rapid deterioration. A professor of oceanography at the University of Rhode Island (URI) and director of its Institute for Archaeological Oceanography, Dr. Ballard and his team of scientists from NOAA and other institutions spent 11 days at the site, mapping the ship and conducting scientific analyses of its deterioration. The team worked aboard NOAA Ship Ronald H. Brown from May 30 through June 9, and used remotely operated vehicles (ROVs) to conduct a sophisticated documentation of the state of Titanic that was not possible in the 1980s. This "Look, don't touch" mission utilized high-definition video and stereoscopic still images to provide an updated assessment of the wreck site. The science team included Dr. Dwight Coleman of URI and the Mystic Aquarium & Institute for Exploration (MAIFE), who was the expedition's research chief. As the marine archaeologist with NOAA's Office of Ocean Exploration, I oversaw the expedition's marine archaeology component. In addition to mapping the Titanic, expedition goals included the microbial research of scientist Roy Cullimore, who studied the natural deterioration of the ship's hull. Tiny microbes that feed on iron and create icicle-shaped formations called rusticles are responsible for this deterioration. While rusticles have been observed for many years, little is known about them. As the nation's ocean agency, NOAA has a vested interest in the scientific and cultural aspects of the Titanic, and in its appropriate treatment and preservation. NOAA's focus is to build a baseline of scientific information from which we can measure the shipwreck's processes and deterioration, and then apply the knowledge we gain to other deep-water shipwrecks and submerged cultural resources. The Guidelines for Research, Exploration and Salvage of RMS Titanic (9 pages, 104k) were issued under the authority of the RMS Titanic Maritime Act of 1986. On Monday, June 7, 2004, at 9 p.m. ET/PT, the National Geographic Channel gave audiences unprecedented access to the ongoing expedition by broadcasting a one-hour special, "Return to Titanic External Link," which originated from NOAA Ship Ronald H. Brown and included a live underwater telecast from the Titanic. Simultaneous with the expedition, MAIFE enabled thousands of children to experience the Titanic mission as it occurred. From June 4 through 9, four shows a day were transmitted live from the expedition via satellite and Internet2 to participating sites. The JASON Foundation for Education has created a new middle-school math curriculum called "JASON Math Adventure: Geometry and Return to Titanic," which follows the work of researchers on the expedition. Students will learn how geometry concepts are used to position NOAA Ship Ronald H. Brown at the Titanic wreck and the ROV Hercules on the Titanic's bow. Technology partners on the expedition included EDS of Texas, which wired the mission, and VBrick Systems of Connecticut, which enabled the mission feed to be broadcast nationwide.
https://www.encyclopedia-titanica.org/copyright-and-permissions.htmlhttps://www.encyclopedia-titanica.org/copyright-and-permissions.html
A comprehensive dataset containing detailed profiles of all children (14 and under) who were aboard the Titanic. This includes information on their names, ages, family relationships, cabin assignments, nationalities, and survival status. The dataset provides insights into the demographics and experiences of the youngest passengers on the Titanic.
This is an attempt to learn prediction from the given training and test sets
This is the sample from Kaggle learners section
Thankx Kaggle for this sample data set and allowing us to use it for learning
Learn to use data to solve problems and provide solutions.
This is the classic Titanic Dataset provided in the Kaggle Competition K Kernel and then cleaned in one of the most popular Kernels there. Please see the Kernel titled, "A Data Science Framework: To Achieve 99% Accuracy" for a great lesson in data science. This Kernel gives a great explanaton of the thinking behind the of this data cleaning as well as a very professional demonstration of the technologies and skills to do so. It then continues to provide an overview of many ML techniques and it is copiously and meticulously documented with many useful citations.
Of course, data cleaning is an essential skill in data science but I wanted to use this data for a study of other machine learning techniques. So, I found and used this set of data that is well known and cleaned to a benchmark accepted by many.
The Titanic dataset is a well-known dataset that provides information on the passengers who were onboard the fateful voyage of the RMS Titanic. The data includes details such as the passenger's name, age, gender, ticket class, fare paid, and information on their family members. The dataset also includes a column called "Survived" which indicates whether a passenger survived the disaster or not.
There are a total of 891 rows in the dataset, with 12 columns. Some of the key columns in the dataset include:
• PassengerId: a unique identifier for each passenger • Survived: a binary variable that indicates whether the passenger survived (1) or did not survive (0) the disaster • Pclass: the ticket class of the passenger (1 = first class, 2 = second class, 3 = third class) • Name: the name of the passenger • Sex: the gender of the passenger (male or female) • Age: the age of the passenger (some values are missing) • SibSp: the number of siblings or spouses the passenger had on board • Parch: the number of parents or children the passenger had on board • Ticket: the ticket number of the passenger • Fare: the fare paid by the passenger • Cabin: the cabin number of the passenger (some values are missing) • Embarked: the port at which the passenger embarked (C = Cherbourg, Q = Queenstown, S = Southampton)
Overall, the key challenges I encountered when working on the Titanic dataset were: how to handle missing values and imbalanced classes, encode categorical variables, reduce the dimensionality of the dataset, and identify and handle noise in the data.
Here are a few tips and resources that I found helpful when getting started in the Titanic dataset competition: 1. Get familiar with the dataset 2. Pre-process the data 3. Split the data into training and test sets 4. Try out a few different algorithms 5. Tune the hyper parameters 6. Evaluate the model
Here are a few resources that I found helpful as I started Working on the competition: • Kaggle's Titanic tutorial • scikit-learn documentation. • Pandas documentation
The data has been split into two groups:
training set (train.csv)
test set (test.csv)
The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.
The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.
We also include gender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.
Variable Definition Key
survival Survival 0 = No, 1 = Yes
pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd
sex Sex
Age Age in years
sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the Titanic
ticket Ticket number
fare Passenger fare
cabin Cabin number
embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton
pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower
age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)
parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.
This dataset was created by Vicky Nayak
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
The Titanic dataset is a popular dataset used for data analysis and machine learning tasks. It contains various information about passengers aboard the Titanic, including whether they survived or not. Here's a brief description of each of the columns:
PassengerId: A unique identifier for each passenger. Survived: Indicates whether the passenger survived or not. (0 = No, 1 = Yes) Pclass: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd) Name: Name of the passenger. Sex: Gender of the passenger. Age: Age of the passenger in years. (Fractional if less than 1) SibSp: Number of siblings or spouses aboard the Titanic. Parch: Number of parents or children aboard the Titanic. Ticket: Ticket number. Fare: Fare paid for the ticket. Cabin: Cabin number. Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton) This dataset is often used for tasks such as predicting survival based on various factors or analyzing demographics of passengers aboard the Titanic.