This is Titanic dataset
Attributes | Definition | Key |
---|---|---|
sex | Sex/Gender | male/female |
age | Age | |
sibsp | siblings of the passenger | 0/1 /2 ... |
parch | parents / children aboard the Titanic | 0/1/2 ... |
fare | Passenger fare | |
embarked | Port of Embarkation | C : Cherbourg, Q : Queenstown, S : Southampton |
class | Ticket class | First / Second / Third |
who | categories to passengers | male, female, child |
alone | he was alone in ship or no | 0/1 |
survived | 0/1 |
This is a classic dataset used in many data mining tutorials and demos -- perfect for getting started with exploratory analysis and building binary classification models to predict survival.
Data covers passengers only, not crew.
http://i.imgur.com/sz2sj47.png" alt="Imgur">
http://i.imgur.com/FLDktH4.png" alt="Imgur">
Dataset describing the survival status of individual passengers on the Titanic. Missing values in the original dataset are represented using ?. Float and int missing values are replaced with -1, string missing values are replaced with 'Unknown'.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('titanic', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11299784%2F6530245ff6b6d097af8cb56c86b79943%2Fpxfuel.jpg?generation=1682007437079315&alt=media" alt="">The Titanic dataset is a widely used dataset that contains information on the passengers who were aboard the Titanic when it sank on its maiden voyage in 1912. The dataset includes features such as age, sex, passenger class, and fare paid, as well as whether or not the passenger survived the sinking. The dataset is often used for machine learning and data analysis tasks, such as predicting survival based on passenger characteristics or exploring patterns in the data. The Titanic dataset is a classic example of data analysis and is a great starting point for those new to data science.
The Titanic dataset is available in CSV format and contains two files, one for training and one for testing. The training file is used to build the machine learning model, while the testing file is used to test the performance of the model.
PassengerId: unique identifier for each passenger Survived: whether the passenger survived (1) or not (0) Pclass: passenger class (1 = 1st class, 2 = 2nd class, 3 = 3rd class) Name: name of the passenger Sex: gender of the passenger Age: age of the passenger (in years) SibSp: number of siblings or spouses aboard the Titanic Parch: number of parents or children aboard the Titanic Ticket: ticket number Fare: passenger fare Cabin: cabin number Embarked: port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
Copyright (c) [2023] [Md Kazi Sajiduddin]
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A more comprehensive list than titanic competition
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Additional features from Wikipedia Titanic passenger list
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of âTitanic datasetâ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/brendan45774/test-file on 21 November 2021.
--- Dataset description provided by original source is as follows ---
I took the titanic test file and the gender_submission and put them together in excel to make a csv. This is great for making charts to help you visualize. This also will help you know who died or survived. At least 70% right, but its up to you to make it 100% Thanks to the titanic beginners competitions for providing with the data. Please Upvote my dataset, it will mean a lot to me. Thank you!
--- Original source retains full ownership of the source dataset ---
The Titanic dataset is a well-known dataset that provides information on the passengers who were onboard the fateful voyage of the RMS Titanic. The data includes details such as the passenger's name, age, gender, ticket class, fare paid, and information on their family members. The dataset also includes a column called "Survived" which indicates whether a passenger survived the disaster or not.
There are a total of 891 rows in the dataset, with 12 columns. Some of the key columns in the dataset include:
⢠PassengerId: a unique identifier for each passenger ⢠Survived: a binary variable that indicates whether the passenger survived (1) or did not survive (0) the disaster ⢠Pclass: the ticket class of the passenger (1 = first class, 2 = second class, 3 = third class) ⢠Name: the name of the passenger ⢠Sex: the gender of the passenger (male or female) ⢠Age: the age of the passenger (some values are missing) ⢠SibSp: the number of siblings or spouses the passenger had on board ⢠Parch: the number of parents or children the passenger had on board ⢠Ticket: the ticket number of the passenger ⢠Fare: the fare paid by the passenger ⢠Cabin: the cabin number of the passenger (some values are missing) ⢠Embarked: the port at which the passenger embarked (C = Cherbourg, Q = Queenstown, S = Southampton)
Overall, the key challenges I encountered when working on the Titanic dataset were: how to handle missing values and imbalanced classes, encode categorical variables, reduce the dimensionality of the dataset, and identify and handle noise in the data.
Here are a few tips and resources that I found helpful when getting started in the Titanic dataset competition: 1. Get familiar with the dataset 2. Pre-process the data 3. Split the data into training and test sets 4. Try out a few different algorithms 5. Tune the hyper parameters 6. Evaluate the model
Here are a few resources that I found helpful as I started Working on the competition: ⢠Kaggle's Titanic tutorial ⢠scikit-learn documentation. ⢠Pandas documentation
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of âTitanic DataSet from Kaggleâ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sureshbhusare/titanic-dataset-from-kaggle on 29 August 2021.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of âTitanic-Dataset (train.csv)â provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/hesh97/titanicdataset-traincsv on 28 January 2022.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of âCleaned Titanic Data Set for EDAâ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jagjeet555/cleaned-titanic-data-set-for-eda on 30 September 2021.
--- Dataset description provided by original source is as follows ---
This is a Very famous Titanic dataset but it has been cleaned using various statistical method.
This Data set contain details of Various Titanic Passengers which include there Passenger ID, Survived (0= Not Survived, 1= Survived), Passenger Class (There are 3 Classes of Passenger In our Data Sets), Sex, Age, SibSp (it stand for Sibling and Spouse), Parch (It Stands for Parents and Children)
Learnt to do this by learning from Open source Platform like Python with Mosh, Data Analysis with Jovian, and Kaggle
You are welcome to do EDA on this Data
--- Original source retains full ownership of the source dataset ---
Hugo0133/Spaceship-Titanic dataset hosted on Hugging Face and contributed by the HF Datasets community
titanic5 Dataset Created by David Beltran del Rio March 2016.
Notes This is the final (for now) version of my update to the Titanic data. I think itâs finally ready for publishing if youâd like. What I did was to strip all the passenger and crew data from the Encyclopedia Titanica (ET) web pages (excluding channel crossing passengers), create a unique ID for each passenger and crew member (Name_ID), then (painstakingly and hopefully 100% correctly) match to your earlier titanic3 dataset, in order to compare the two and to get your sibsp and parch variables. Since the ET is updated occasionally the work put into the ID and matching can be reused and refined later. I did eventually hear back from the ET people, they are willing to make the underlying database available in the future, I have not yet taken them up on it.
The two datasets line up nicely, most of the differences in the newer titanic5 dataset are in the age variable, as I had mentioned before - the new set has less missing ages - 51 missing (vs 263) out of 1309.
I am in the process of refining my analysis of the data as well, based on your comments below and your Regression Modeling Strategies example.
titanic3_wID data can be matched to titanic5 using the Name_ID variable. Tab titanic5 Metadata has the variable descriptions and allowable values for Class and Class/Dept.
A note about the ages - instead of using the add 0.5 trick to indicate estimated birth day / date I have a flag that indicates how the âfinalâ age (Age_F) was arrived at. Itâs the Age_F_Code variable - the allowable values are in the Titanic5_metadata tab in the attached excel. The reason for this is that I already had some fractional ages for infants where I had age in months instead of years and I wanted to avoid confusion for 6 month old infants, although I donât think there are any in the data! Also, I was thinking to make fractional ages or age in days for all passengers for whom I have DoB, but I have not yet done so.
Hereâs what the tabs are:
Titanic5_all - all (mostly cleaned) Titanic passenger and crew records Titanic5_work - working dataset, crew removed, unnecessary variables removed - this is the one I import into SAS / R to work on Titanic5_metadata - Variable descriptions and allowable values titanic3_wID - Original Titanic3 dataset with Name_ID added for merging to Titanic5 I have a csv, R dataset, and SAS dataset, but the variable names are an older version, so I wonât send those along for now to avoid confusion.
If it helps send my contact info along to your student in case any questions arise. Gmail address probably best, on weekends for sure: davebdr@gmail.com
The tabs in titanic5.xls are
Titanic5_all Titanic5_passenger (the one to be used for analysis) Titanic5_metadata (used during analysis file creation) Titanic3_wID
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of âTitanic dataâ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kandij/titanic-data on 30 September 2021.
--- Dataset description provided by original source is as follows ---
The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1,502 out of 2,224 passengers and crew members. This sensational tragedy shocked the international community and eventually, it led to better safety regulations for ships.
The dataset is useful for those who have started learning data visualization and machine learning. We will be using Python as our working language.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of âTitanic_Datasetâ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ozlemilgun/titanic-dataset on 30 September 2021.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
Nearly 20 years after first finding the sunken remains of the RMS Titanic, marine explorer Robert Ballard returned in June 2004 to help the National Oceanic and Atmospheric Administration (NOAA) study the ship's rapid deterioration. A professor of oceanography at the University of Rhode Island (URI) and director of its Institute for Archaeological Oceanography, Dr. Ballard and his team of scientists from NOAA and other institutions spent 11 days at the site, mapping the ship and conducting scientific analyses of its deterioration. The team worked aboard NOAA Ship Ronald H. Brown from May 30 through June 9, and used remotely operated vehicles (ROVs) to conduct a sophisticated documentation of the state of Titanic that was not possible in the 1980s. This "Look, don't touch" mission utilized high-definition video and stereoscopic still images to provide an updated assessment of the wreck site. The science team included Dr. Dwight Coleman of URI and the Mystic Aquarium & Institute for Exploration (MAIFE), who was the expedition's research chief. As the marine archaeologist with NOAA's Office of Ocean Exploration, I oversaw the expedition's marine archaeology component. In addition to mapping the Titanic, expedition goals included the microbial research of scientist Roy Cullimore, who studied the natural deterioration of the ship's hull. Tiny microbes that feed on iron and create icicle-shaped formations called rusticles are responsible for this deterioration. While rusticles have been observed for many years, little is known about them. As the nation's ocean agency, NOAA has a vested interest in the scientific and cultural aspects of the Titanic, and in its appropriate treatment and preservation. NOAA's focus is to build a baseline of scientific information from which we can measure the shipwreck's processes and deterioration, and then apply the knowledge we gain to other deep-water shipwrecks and submerged cultural resources. The Guidelines for Research, Exploration and Salvage of RMS Titanic (9 pages, 104k) were issued under the authority of the RMS Titanic Maritime Act of 1986. On Monday, June 7, 2004, at 9 p.m. ET/PT, the National Geographic Channel gave audiences unprecedented access to the ongoing expedition by broadcasting a one-hour special, "Return to Titanic External Link," which originated from NOAA Ship Ronald H. Brown and included a live underwater telecast from the Titanic. Simultaneous with the expedition, MAIFE enabled thousands of children to experience the Titanic mission as it occurred. From June 4 through 9, four shows a day were transmitted live from the expedition via satellite and Internet2 to participating sites. The JASON Foundation for Education has created a new middle-school math curriculum called "JASON Math Adventure: Geometry and Return to Titanic," which follows the work of researchers on the expedition. Students will learn how geometry concepts are used to position NOAA Ship Ronald H. Brown at the Titanic wreck and the ROV Hercules on the Titanic's bow. Technology partners on the expedition included EDS of Texas, which wired the mission, and VBrick Systems of Connecticut, which enabled the mission feed to be broadcast nationwide.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of âTitanic: Machine Learning from Disasterâ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/shuofxz/titanic-machine-learning-from-disaster on 30 September 2021.
--- No further description of dataset provided by original source ---
--- Original source retains full ownership of the source dataset ---
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Titanic dataset for classification training.
This is a classic dataset used in many data mining tutorials and demos -- perfect for getting started with exploratory analysis and building binary classification models to predict survival.
Data covers passengers only, not crew.
http://i.imgur.com/sz2sj47.png" alt="Imgur">
http://i.imgur.com/FLDktH4.png" alt="Imgur">