Titanic survivors — logistic regression

using logistic regression we will make a model that will predict whether a person will survive or not if he was in the titanic (with source code)

Aviral Bhardwaj
4 min readJun 19, 2021

as in the previous article, I have given you an introduction to logistic regression now I will tell you how to make a basic logistic regression model in this article with some lines of codes.so let’s start

the first step is we need to download the dataset and then apply the dataset to the model.

you can download or copy data from the url -https://raw.githubusercontent.com/aviralb13/codes/main/datas/titanic_survivors.csv

importing the libraries

now we will import pandas and NumPy as given below if your system has not installed these libraries you can download by pip command.

import pandas as pd

import numpy as np

data preparation

now after copying data we will read our data with the help of the pandas head command shows the first 5 elements in the dataset and the shape command shows the number of rows and columns

url = ‘https://raw.githubusercontent.com/aviralb13/codes/main/datas/titanic_survivors.csv'

titanic = pd.read_csv(url)

titanic.head()

titanic.shape

now we will create 2 list one will be of no use in which the basic information is given i.e. name would not tell wheater a person will survive and in another list , we will be providing data that will tell wheater the person can survive or not which we are naming as important info

basic_info = [‘Name’,’Age’,’Sex’]

important_info =[‘Pclass’,’SibSp’,’Fare’]

x = titanic[important_info]

y = titanic[‘Survived’]

pclass is the passenger class

fare is the person has paid for the ticket

sibsp is telling us about their families (siblings and spouse)

(we can take more columns if we want ie age so it depends on you what you want to take )

these are the important information we are considering for the predictions and how data would be depending on them

if pclass is higher then the possibility of surviving is higher if the fare is higher then the possibility of surviving is higher if the number of siblings and spouse is higher then the possibility of surviving is lower

making the model

now we will import the logistic model from sklearn and train test split module

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

now we will split our data into 2 parts train and test, train dataset will train the model and the test will be calculating the model score we now can fit our train dataset into the model by the following commands.

train_x, val_x,train_y, val_y = train_test_split(x,y)

logistic_model = LogisticRegression()

logistic_model.fit(train_x,train_y)

predicting from the model

now we are predicting the first 5 elements possibility of surviving

1 means a person has survived

0 mean a person has not survived

and now we can check the actual result to check the model

logistic_model.predict(x[:5])

y.head()

as you can see our model has made only 1 prediction wrong rest of the 4 are correct so this is a good accuracy to know the score of our model we can use the .score() function to know the score

predictions = logistic_model.predict(val_x)

logistic_model.score(val_x,val_y)

as you can see our model is 65%accurate which is a good percentage in the next few articles, I will show you how to improve the score

you can go check the link for full code.

conclusion

in the article I have given you information and codes on how to make a simple logistic regression model with source code I would be making more exciting projects for you so stay connected

--

--

Aviral Bhardwaj

One of the youngest writer and mentor on AI-ML & Technology.