Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset containing information about passengers aboard the Titanic is one of the most famous datasets used in data science and machine learning. It was created to analyze and understand the factors that influenced survival rates among passengers during the tragic sinking of the RMS Titanic on April 15, 1912.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19517213%2Fd4016c159f1ad17cb30d8905192fe9d7%2Ftitanic-ship_1027017-11.avif?generation=1711562371875068&alt=media" alt="">
The dataset is often used for predictive modeling and statistical analysis to determine which factors (such as socio-economic status, age, gender, etc.) were associated with a higher likelihood of survival. It contains 1309 rows and 14 columns.
Pclass: Ticket class indicating the socio-economic status of the passenger. It is categorized into three classes: 1 = Upper, 2 = Middle, 3 = Lower.
Survived: A binary indicator that shows whether the passenger survived (1) or not (0) during the Titanic disaster. This is the target variable for analysis.
Name: The full name of the passenger, including title (e.g., Mr., Mrs., etc.).
Sex: The gender of the passenger, denoted as either male or female.
Age: The age of the passenger in years.
SibSp: The number of siblings or spouses aboard the Titanic for the respective passenger.
Parch: The number of parents or children aboard the Titanic for the respective passenger.
Ticket: The ticket number assigned to the passenger.
Fare: The fare paid by the passenger for the ticket.
Cabin: The cabin number assigned to the passenger, if available.
Embarked: The port of embarkation for the passenger. It can take one of three values: C = Cherbourg, Q = Queenstown, S = Southampton.
Boat: If the passenger survived, this column contains the identifier of the lifeboat they were rescued in.
Body: If the passenger did not survive, this column contains the identification number of their recovered body, if applicable.
Home.dest: The destination or place of residence of the passenger.
These descriptions provide a detailed understanding of each column in the Titanic dataset subset, offering insights into the demographic, travel, and survival-related information recorded for each passenger.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Detail Description: The Titanic dataset offers a comprehensive glimpse into the passengers aboard the ill-fated RMS Titanic, which famously sank on its maiden voyage in April 1912 after colliding with an iceberg. This dataset contains a wealth of information about individual passengers, including demographics, ticket class, cabin information, family relationships, fare details, and most notably, survival outcomes.
Key attributes within the dataset include:
Passenger Class (Pclass): This categorical variable indicates the ticket class of each passenger, ranging from 1st class (wealthiest) to 3rd class (lower socioeconomic status).
Name: The names of passengers, providing insight into their identities.
Sex: Gender of passengers, categorized as male or female.
Age: Age of passengers, providing information about the demographic composition of the Titanic's passengers.
SibSp: Number of siblings/spouses aboard the Titanic, offering insight into family relationships.
Parch: Number of parents/children aboard the Titanic, indicating family size and composition.
Ticket: Ticket number, providing additional information about passenger accommodations and fare details.
Fare: Fare paid by each passenger, which can be indicative of their ticket class and economic status.
Cabin: Cabin number or location, offering insights into passenger accommodations.
**Embarked: **Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton), providing information about passengers' embarkation points.
**Survived: **This binary variable indicates whether a passenger survived the disaster (1) or not (0), serving as the primary outcome variable for analyses.
Researchers and data analysts frequently utilize the Titanic dataset for various purposes, including:
Exploratory data analysis to understand the demographic composition of passengers and their survival outcomes. Predictive modeling to develop algorithms that predict the likelihood of survival based on passenger characteristics. Feature engineering to derive new variables that may enhance predictive accuracy. Hypothesis testing to investigate factors associated with survival rates, such as passenger class, gender, age, and family size. Overall, the Titanic dataset serves as a valuable resource for understanding historical events, exploring data analysis techniques, and teaching machine learning concepts. Its accessibility and rich contextual information make it a popular choice for both educational and research purposes within the data science community.
Facebook
Twitterhttps://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11299784%2F6530245ff6b6d097af8cb56c86b79943%2Fpxfuel.jpg?generation=1682007437079315&alt=media" alt="">The Titanic dataset is a widely used dataset that contains information on the passengers who were aboard the Titanic when it sank on its maiden voyage in 1912. The dataset includes features such as age, sex, passenger class, and fare paid, as well as whether or not the passenger survived the sinking. The dataset is often used for machine learning and data analysis tasks, such as predicting survival based on passenger characteristics or exploring patterns in the data. The Titanic dataset is a classic example of data analysis and is a great starting point for those new to data science.
The Titanic dataset is available in CSV format and contains two files, one for training and one for testing. The training file is used to build the machine learning model, while the testing file is used to test the performance of the model.
PassengerId: unique identifier for each passenger Survived: whether the passenger survived (1) or not (0) Pclass: passenger class (1 = 1st class, 2 = 2nd class, 3 = 3rd class) Name: name of the passenger Sex: gender of the passenger Age: age of the passenger (in years) SibSp: number of siblings or spouses aboard the Titanic Parch: number of parents or children aboard the Titanic Ticket: ticket number Fare: passenger fare Cabin: cabin number Embarked: port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
Copyright (c) [2023] [Md Kazi Sajiduddin]
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for Titanic Survival Prediction
Dataset Details
Dataset Description
This dataset is a copy of the original Kaggle Titanic dataset made to explore the Hugging Face Datasets feature. The Titanic Survival Prediction dataset is widely used in machine learning and statistics. It originates from the Titanic: Machine Learning from Disaster competition on Kaggle. The dataset consists of passenger details from the RMS Titanic disaster, including demographic… See the full description on the dataset page: https://huggingface.co/datasets/paulopontesm/titanic.
Facebook
TwitterTomate/Kaggle-Titanic dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • Based on passenger information from the Titanic, which sank in 1912, the Titanic Dataset is a representative binary classification data that includes various demographics and boarding information such as Survived, Passengers Class, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, and Embarked.
2) Data Utilization (1) Titanic Dataset has characteristics that: • It consists of a total of 891 training samples and 12 to 15 columns (numerical and categorical mix) and also includes variables such as Age, Cabin, and Embarked with some missing values, making it suitable for preprocessing and feature engineering practice. (2) Titanic Dataset can be used to: • Development of survival prediction models: Key characteristics such as passenger rating, gender, age, and fare can be used to predict survival with different machine learning classification models such as logistic regression, random forest, and SVM. • Analysis of survival influencing factors: By analyzing the correlation between variables such as gender, age, socioeconomic status, and survival rates, you can statistically and visually explore which groups have a higher survival probability.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
BrianSuToronto/titanic-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Titanic dataset on Kaggle is a well-known dataset used for machine learning and data science projects, especially for binary classification tasks. It includes data on the passengers of the Titanic, which sank on its maiden voyage in 1912. This dataset is often used to predict the likelihood of a passenger's survival based on various features. Here is a detailed description of the dataset:
Overview The Titanic dataset includes information about the passengers on the Titanic, such as their demographic information, class, fare, and whether they survived the disaster. The goal is to predict the survival of the passengers.
Files The dataset typically includes three files:
train.csv: The training set, which includes the features and the target variable (Survived). test.csv: The test set, which includes the features but not the target variable. You use this file to make predictions that can be submitted to Kaggle. gender_submission.csv: An example of a submission file in the correct format. Features The dataset contains the following columns:
PassengerId: Unique ID for each passenger. Survived: Target variable (0 = No, 1 = Yes) indicating if the passenger survived. Pclass: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd). Name: Name of the passenger. Sex: Gender of the passenger (male or female). Age: Age of the passenger in years. Fractional values indicate age in months for infants. SibSp: Number of siblings or spouses aboard the Titanic. Parch: Number of parents or children aboard the Titanic. Ticket: Ticket number. Fare: Passenger fare. Cabin: Cabin number. Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Titanic dataset is a popular dataset in the field of data science and machine learning. It contains information about the passengers aboard the RMS Titanic, which sank on its maiden voyage in 1912 after hitting an iceberg. The dataset is often used for predictive modeling and classification tasks.
Here are the key features or columns in the Titanic dataset:
- PassengerId: A unique identifier assigned to each passenger. - Survived: A binary variable indicating whether the passenger survived (1) or did not survive (0). - Pclass (Passenger Class): The ticket class of the passenger, which can be 1st (1), 2nd (2), or 3rd (3) class. - Name: The name of the passenger. - Sex: The gender of the passenger (male or female). - Age: The age of the passenger in years. It may contain missing values. - SibSp: The number of siblings or spouses the passenger had aboard the Titanic. - Parch: The number of parents or children the passenger had aboard the Titanic. - Ticket: The ticket number. - Fare: The amount of money the passenger paid for the ticket. - Cabin: The cabin number where the passenger stayed. It may contain missing values. - Embarked: The port at which the passenger boarded the Titanic (C for Cherbourg, Q for Queenstown, S for Southampton).
Facebook
Twitterhttps://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Titanic Survival
from https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/problem12.html
Facebook
Twitterhttps://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Javitron4257/Titanic-Dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides information on passengers aboard the RMS Titanic, including features that can be used for predicting survival. It contains various attributes related to passengers such as age, sex, ticket fare, and passenger class, which are crucial for understanding patterns and building predictive models.
Content:
PassengerID: Unique identifier for each passenger. Pclass: Passenger class (1st, 2nd, 3rd). Name: Full name of the passenger. Sex: Gender of the passenger. Age: Age of the passenger. SibSp: Number of siblings/spouses aboard. Parch: Number of parents/children aboard. Ticket: Ticket number. Fare: Ticket fare. Cabin: Cabin number. Embarked: Port of embarkation (C = Cherbourg; Q = Queenstown; S = Southampton). Survived: Survival status (0 = No; 1 = Yes).
Usage:
This dataset is ideal for practice in classification tasks, particularly for predicting binary outcomes such as survival status. It is commonly used for various machine learning challenges, including exploratory data analysis and feature engineering.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
🛳️ Titanic Dataset (JSON Format) 📌 Overview
This is the classic Titanic: Machine Learning from Disaster dataset, converted into JSON format for easier use in APIs, data pipelines, and Python projects. It contains the same passenger details as the original CSV version, but stored as JSON for convenience.
📂 Dataset Contents
File: titanic.json
Columns: PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked
Use Cases: Exploratory Data Analysis (EDA), feature engineering, machine learning model training, web app backends, JSON parsing practice.
🛠️ How to Use 🔹 1. Load with kagglehub import kagglehub
path = kagglehub.dataset_download("engrbasit62/titanic-json-format") print("Path to dataset files:", path)
🔹 2. Load into Pandas import pandas as pd
df = pd.read_json(f"{path}/titanic.json")
print(df.head())
💡 Notes
Preview truncation: Kaggle may show only part of the JSON in the preview panel because of its size. ✅ Don’t worry — the full dataset is available when loaded via code.
Benefits of JSON format: Ideal for web apps, APIs, or projects that work with structured data. Easily convertible back to CSV if needed.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset used in the Titanic - Machine Learning from Disaster competiton.
titanic.csv:
- passengerId - Passenger unique ID.
- survival - If the passenger survived, 0 = No, 1 = Yes.
- pclass - Ticket class, 1 = 1st, 2 = 2nd, 3 = 3rd.
- name - Name of the passenger.
- sex - Sex.
- age - Age in years.
- sibsp - # of siblings / spouses aboard the Titanic.
- parch - # of parents / children aboard the Titanic.
- ticket - Ticket number.
- fare - Passenger fare.
- cabin - Cabin number.
- embarked - Port of Embarkation, C = Cherbourg, Q = Queenstown, S = Southampton.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone on board, resulting in the death of 1502 out of 2224 passengers and crew. While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others. In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).
Objective:
Survival Prediction: To build a logistic regression model that accurately predicts the survival of passengers based on features such as age, gender, passenger class, and number of siblings/spouses aboard.
Data Cleaning and Preprocessing:To perform data cleaning by handling missing values, removing unnecessary columns, and encoding categorical variables to prepare the dataset for analysis.
Exploratory Data Analysis (EDA): To conduct a thorough exploratory data analysis to visualize survival rates and identify patterns based on various factors like gender, passenger class, and embarked location.
Feature Importance Analysis: To analyze the correlation between different features and their impact on survival rates, identifying which factors are the most significant predictors of survival.
Model Evaluation: To evaluate the performance of the logistic regression model using accuracy scores and classification reports, ensuring that the model generalizes well to unseen data.
ROC Curve Analysis: To create a ROC curve to assess the trade-off between the true positive rate and false positive rate, providing insights into the model's ability to distinguish between survivors and non-survivors.
Insights and Recommendations: To derive insights from the analysis that could inform future safety measures or policies related to passenger safety in maritime travel.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a simplified version of the famous Titanic dataset, which contains information about passengers aboard the Titanic ship. It is designed specifically for beginners who are learning about data analysis and classification problems.
Note: This simplified dataset does not contain all the columns available in the original Titanic dataset, but it retains the essential features for introductory purposes.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Titanic Survival Prediction Project explores one of the most iconic datasets in data science. The goal is to predict whether a passenger survived the Titanic disaster based on key attributes such as age, gender, ticket class, family size, and fare.
Using a dataset of 100,000 synthetic records inspired by the original Titanic data, this project demonstrates a complete data science workflow — including data cleaning, exploratory data analysis (EDA), feature engineering, and predictive modeling.
By analyzing patterns (e.g., higher survival rates among women, children, and first-class passengers), the project showcases how machine learning can uncover meaningful insights from historical events.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset typically includes the following columns:
PassengerId: A unique identifier for each passenger. Survived: This column indicates whether a passenger survived (1) or did not survive (0). Pclass (Ticket class): A proxy for socio-economic status, with 1 being the highest class and 3 the lowest. Name: The name of the passenger. Sex: The gender of the passenger. Age: The age of the passenger. (Note: There might be missing values in this column.) SibSp: The number of siblings or spouses the passenger had aboard the Titanic. Parch: The number of parents or children the passenger had aboard the Titanic. Ticket: The ticket number. Fare: The amount of money the passenger paid for the ticket.
The main goal of using this dataset is to predict whether a passenger survived or not based on various features. It serves as a popular introductory dataset for those learning data analysis, machine learning, and predictive modeling. Keep in mind that the dataset may be subject to variations and updates, so it's always a good idea to check the Kaggle website or dataset documentation for the most recent information.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Waqas Ali Khan
Released under Apache 2.0
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Khalid Hussain
Released under Apache 2.0
Titanic Dataset CSV File
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset containing information about passengers aboard the Titanic is one of the most famous datasets used in data science and machine learning. It was created to analyze and understand the factors that influenced survival rates among passengers during the tragic sinking of the RMS Titanic on April 15, 1912.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F19517213%2Fd4016c159f1ad17cb30d8905192fe9d7%2Ftitanic-ship_1027017-11.avif?generation=1711562371875068&alt=media" alt="">
The dataset is often used for predictive modeling and statistical analysis to determine which factors (such as socio-economic status, age, gender, etc.) were associated with a higher likelihood of survival. It contains 1309 rows and 14 columns.
Pclass: Ticket class indicating the socio-economic status of the passenger. It is categorized into three classes: 1 = Upper, 2 = Middle, 3 = Lower.
Survived: A binary indicator that shows whether the passenger survived (1) or not (0) during the Titanic disaster. This is the target variable for analysis.
Name: The full name of the passenger, including title (e.g., Mr., Mrs., etc.).
Sex: The gender of the passenger, denoted as either male or female.
Age: The age of the passenger in years.
SibSp: The number of siblings or spouses aboard the Titanic for the respective passenger.
Parch: The number of parents or children aboard the Titanic for the respective passenger.
Ticket: The ticket number assigned to the passenger.
Fare: The fare paid by the passenger for the ticket.
Cabin: The cabin number assigned to the passenger, if available.
Embarked: The port of embarkation for the passenger. It can take one of three values: C = Cherbourg, Q = Queenstown, S = Southampton.
Boat: If the passenger survived, this column contains the identifier of the lifeboat they were rescued in.
Body: If the passenger did not survive, this column contains the identification number of their recovered body, if applicable.
Home.dest: The destination or place of residence of the passenger.
These descriptions provide a detailed understanding of each column in the Titanic dataset subset, offering insights into the demographic, travel, and survival-related information recorded for each passenger.