Titanic Machine Learning Study from Disaster

Date
2020-05
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Applied Economics and Statistics, University of Delaware, Newark, DE.
Abstract
Machine learning plays an important role in the data science field nowadays. They can be used for classification problems. In this project, we are interested in understanding what kinds of people were more likely to survive the sinking of Titanic using different machine learning methods. Different predictors of passenger information were provided, and the survival chance of different passengers was predicted based on their covariates using 5 different machine learning methods including Conventional Logistic Regression, Random Forest, K-Nearest Neighbor, Support Vector Machine and Gradient Boosting. Grid Search Cross-validation was used for calibrating the prediction accuracy of different methods. The SVM model performs the best for our data with nine predictors and the prediction accuracy is about 83%. The Random Forest model performs the best for our data with six predictors and the prediction accuracy is also about 83%. We used Python for the whole analysis including cleaning the data, visualization, validation, and modeling.
Description
Keywords
Machine learning, Titanic, Survival rate, Prediction accuracy
Citation