Facebook
TwitterThis dataset was created by Nabarungos
Facebook
TwitterThis dataset was created by Harun-Ur-Rashid
Released under Data files © Original Authors
Facebook
TwitterIn this Dataset you find the original titanic csv-files train and test. Special in this dataset is, that I added the right (100%) Survival Solution to the test data. This is only for better and faster evaluation of your own solution. Please don't upload this solution as a Submission to the official Competition!
Please be fair to the other Kagglers!
Facebook
TwitterThe data has been split into two groups:
training set (train.csv)
test set (test.csv)
The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.
The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.
We also include gender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.
Variable Definition Key
survival Survival 0 = No, 1 = Yes
pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd
sex Sex
Age Age in years
sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the Titanic
ticket Ticket number
fare Passenger fare
cabin Cabin number
embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton
pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower
age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)
parch: The dataset defines family relations in this way... Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.
Facebook
TwitterI have given solution of titatnic dataset problem with just few steps which is easily understandable. This is a classifiaction problem in which i have to predict whether the passenger survived or not. There were so many columns in this dataset which have lots of missing values and duplicates values,so first i have imputed missing values with mean values and drop some columns which have no correlation with survived. Then i have converted categorical variable into continuous variable and used many ML Models to build my first predictive model.
I will recommend you to download this file and open into jupyter notebook or any other notebook which can read this python code.I have given the detail and reason of each step i have used in this program.So you can easily understand,still if you face any problem you can ask me in this discussion portal.I'll try my best to solve your doubts. Hoping for positive responses and upvotes. Thanks!!!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Arti Mishra
Released under CC0: Public Domain
Facebook
TwitterThis dataset was created by Bharath Reddy G
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by FSBDS41_MatheusKao
Released under Apache 2.0
Facebook
TwitterThis dataset was created by Anjali Chaudhary
Facebook
TwitterThis dataset was created by Antonio Rivero
Facebook
TwitterDataset is final solution for dealing with missing values in the Spaceship Titanic competition. Kaggle Notebook: https://www.kaggle.com/sardorabdirayimov/best-way-of-dealing-with-missing-values-titanic-2/
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
This dataset was created by HVoltBb
Released under GNU Lesser General Public License 3.0
Facebook
TwitterI have utilized a new approach for cleaning the dataset. In this rather than using one-hot encoding on columns directly, I have divided columns into specified ranges. Like Age: Age_Children, Age_Teenage, Age_Adult, Age_Elder.
If you are facing a low test accuracy, try using this cleaned dataset for the competition.
To know about how I converted raw data into this cleaned version, you can check my solution for the completion here
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by hato bai
Released under MIT
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset stores the correct Fare of the Titanic Dataset. The sequence of data records is identical to the source at: https://www.kaggle.com/datasets/vinicius150987/titanic3.
Table 1— Fare details of the Allison family, likely a couple travelling with two very young children. Issue: Fare values are inflated due to group fare misassignment. Solution: Divide the fare by the group size. e.g., In Table 1, Fare should be corrected by dividing it by 4.
| Allison, Master. Hudson Trevor | Allison, Miss. Helen Loraine | Allison, Mr. Hudson Joshua Creighton | Allison, Mrs. Hudson J C (Bessie Waldo Daniels) | |
|---|---|---|---|---|
| sex | male | female | male | female |
| age | 0.9167 | 2 | 30 | 25 |
| sibsp | 1 | 1 | 1 | 1 |
| parch | 2 | 2 | 2 | 2 |
| ticket | 113781 | 113781 | 113781 | 113781 |
| Fare (£) | 151.55 | 151.55 | 151.55 | 151.55 |
| Corrected Fare (£) | 37.89 | 37.89 | 37.89 | 37.89 |
Details of corrections are reported and freely available via the Open Access Paper published at JSIR: https://or.niscpr.res.in/index.php/JSIR/article/view/16992
Citation request: Tan, S. C. (2025). Beyond the Iceberg: Addressing Hidden Fare Inflation in Titanic Data. Journal of Scientific & Industrial Research, 84(7). https://doi.org/10.56042/jsir.v84i7.16992
Bibtex format: @article{tan2025beyond, title={Beyond the Iceberg: Addressing Hidden Fare Inflation in Titanic Data}, author={Tan, Swee Chuan}, journal={Journal of Scientific & Industrial Research}, volume={84}, number={7}, year={2025}, doi={10.56042/jsir.v84i7.16992}, url={https://doi.org/10.56042/jsir.v84i7.16992} }
Facebook
TwitterThis dataset was created by SONER KURT
Facebook
TwitterThis dataset was created by Xinyuan Zuo
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Since long ago I had in mind to build a solution almost entirely based on self-supervision. This dataset is a consequence of that idea.
These are all pretrained plug-and-play PyTorch modules for embedding fields and features into fix-sized vectors. You just have to call torch.load(file_name) and use that object as described below. You can find all the code needed to use these modules in this notebook.
str_encoder.pt --> a seq-to-seq autoencoder trained on all string data contained in the Titanic dataset. Just call the encode method passing the string to encode and it will give you a fixed size vector that encodes that string.
row_encoder.pt --> a feed-forward neural network trained with self-supervision on 90% of the training data of the Titanic dataset. This network was trained in a BERT-like fashion, by hiding certain known inputs and asking it to predict them. This approach is so immune to overfitting that I actually observed none. I simply stopped training it when the validation loss stagnated. Call the method encode passing the corresponding row and it will give you a fixed size vector encoding all data. This row should include the string embeddings (see notebook for usage examples).
Facebook
TwitterContext: This dataset presents a captivating tale of a transformative lecture on artificial intelligence (AI) where trainees delved into the world of machine learning by tackling the infamous Titanic disaster. Aspiring data scientists and enthusiasts alike sent their data contributions during this immersive lecture experience.
License: Use it however you want :)
Content:
user_name - anonymized data of the trainee
score - The result obtained in the evaluation of the solution for the Titanic challenge
file - submission filename (irrelevant)
timestamp - time of sending the solution
Facebook
TwitterThis dataset was created by Pri Santana
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThis dataset was created by Nabarungos