Titanic Machine Learning Study from Disaster
Date
2020-05
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Applied Economics and Statistics, University of Delaware, Newark, DE.
Abstract
Machine learning plays an important role in the data science field
nowadays. They can be used for classification problems. In this project,
we are interested in understanding what kinds of people were more likely
to survive the sinking of Titanic using different machine learning methods.
Different predictors of passenger information were provided, and the
survival chance of different passengers was predicted based on their
covariates using 5 different machine learning methods including
Conventional Logistic Regression, Random Forest, K-Nearest Neighbor,
Support Vector Machine and Gradient Boosting. Grid Search Cross-validation
was used for calibrating the prediction accuracy of different
methods. The SVM model performs the best for our data with nine
predictors and the prediction accuracy is about 83%. The Random Forest
model performs the best for our data with six predictors and the prediction
accuracy is also about 83%. We used Python for the whole analysis
including cleaning the data, visualization, validation, and modeling.
Description
Keywords
Machine learning, Titanic, Survival rate, Prediction accuracy