Predicting Titanic Survivors with Machine Learning


Talk on machine learning by Ju at Rails Conf 2017.

I really enjoyed Ju's approach to approaching machine learning. Instead of jumping straight to the algorithm calls, he showed how to build a basic statistical analysis program, and then afterwards showed how machine learning algorithms can automate the process.

Tools to Use

  • Python
  • matplotlib.pyplot (can help with visualizations, plotting, normalization, etc)
  • Pandas
  • sklearn with linear_model, preprocessing, tree, model_selection


  1. Read the CSV file in of Titanic survivors
  2. The data contains the list of survivors and those who were victims of the ship sinking

Key to Working with Machine Learning

Follow a logical progression for working with the data set. Follow the data flow to analyze the parameters and see how they can reveal patterns in the knowledge set.

Causation vs Association

In order to build a heuristic, the full set of relevant parameters need to be learned by the data set to ensure that causality is the goal of the learner and not merely association.

Decision Trees

  • Random state
  • Tree height
  • Setting minimum samples
  • Export visualization
