The test set from "Titanic - Machine Learning from Disaster" doesn't include labels.
This is an augmented version of the test set with the correct labels, retrieved from the original Titanic dataset at: https://www.openml.org/search?type=data&sort=runs&id=40945&status=active
The accuracy of the labels was validated by getting a 1.0 score in the competition with them.
This dataset is provided for educational purposes, and is not intended to help people cheat in the competition. If the only reason you want to download this is so you can get a shiny 1.0 on the leaderboards, don't do it.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Titanic dataset on Kaggle is a well-known dataset used for machine learning and data science projects, especially for binary classification tasks. It includes data on the passengers of the Titanic, which sank on its maiden voyage in 1912. This dataset is often used to predict the likelihood of a passenger's survival based on various features. Here is a detailed description of the dataset:
Overview The Titanic dataset includes information about the passengers on the Titanic, such as their demographic information, class, fare, and whether they survived the disaster. The goal is to predict the survival of the passengers.
Files The dataset typically includes three files:
train.csv: The training set, which includes the features and the target variable (Survived). test.csv: The test set, which includes the features but not the target variable. You use this file to make predictions that can be submitted to Kaggle. gender_submission.csv: An example of a submission file in the correct format. Features The dataset contains the following columns:
PassengerId: Unique ID for each passenger. Survived: Target variable (0 = No, 1 = Yes) indicating if the passenger survived. Pclass: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd). Name: Name of the passenger. Sex: Gender of the passenger (male or female). Age: Age of the passenger in years. Fractional values indicate age in months for infants. SibSp: Number of siblings or spouses aboard the Titanic. Parch: Number of parents or children aboard the Titanic. Ticket: Ticket number. Fare: Passenger fare. Cabin: Cabin number. Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is a modified version of the original train.csv file provided for the Kaggle Titanic Competition. Missing passenger ages have been imputed using randomly generated values within a reasonable range. While these values may not reflect the actual ages, they facilitate a more structured classification of passengers into categories such as children and adults.
The imputation process was based on the following assumptions:
It is important to note that while honorifics such as Mr., Miss., Mrs., and Master. were historically used with some flexibility, this dataset (Version 1) assumes a strict age classification based on a legal age of adulthood set at 21.
Version 2 Modifications:
The title "Master." was assumed to refer to males aged 0 to 16. The title "Mr." was assumed to refer to males aged 17 and above. These modifications aim to provide a structured approach to handling missing age data while maintaining reasonable historical assumptions.
This dataset was created by Nabarungos
This dataset was created by Mohammed taha Khamed
This dataset was created by MatteoD83
This dataset was created by Sanchi Batra
This dataset was created by KSanjana2001
This dataset was created by dj thuva
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Ashish Jung Basnet
Released under MIT
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.
The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.
We also include gender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.
| Variable | Definition | Key | | --- | --- | | survival | Survival | 0 = No, 1 = Yes | | pclass | Ticket class | 1 = 1st, 2 = 2nd, 3 = 3rd | | sex | Sex | | | Age | Age in years | | | sibsp | # of siblings / spouses aboard the Titanic | | | parch | # of parents / children aboard the Titanic | | | ticket | Ticket number | | | fare | Passenger fare | | | cabin | Cabin number | | | embarked | Port of Embarkation | C = Cherbourg, Q = Queenstown, S = Southampton |
pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower
age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)
parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.
More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Hehe
This dataset was created by Priyanka
This dataset was created by nguyenthanhktdt
Released under Other (specified in description)
This dataset was created by Trisa Biswas
This dataset was created by MArco
This dataset was created by kordasg
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Ehsaan
Released under CC0: Public Domain
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by RobinReni
Released under CC0: Public Domain
This dataset was created by Parth Gajmal
This dataset was created by Nisha Kushwaha
The test set from "Titanic - Machine Learning from Disaster" doesn't include labels.
This is an augmented version of the test set with the correct labels, retrieved from the original Titanic dataset at: https://www.openml.org/search?type=data&sort=runs&id=40945&status=active
The accuracy of the labels was validated by getting a 1.0 score in the competition with them.
This dataset is provided for educational purposes, and is not intended to help people cheat in the competition. If the only reason you want to download this is so you can get a shiny 1.0 on the leaderboards, don't do it.