Titanic survivors — logistic regression
using logistic regression we will make a model that will predict whether a person will survive or not if he was in the titanic (with source code)
as in the previous article, I have given you an introduction to logistic regression now I will tell you how to make a basic logistic regression model in this article with some lines of codes.so let’s start
the first step is we need to download the dataset and then apply the dataset to the model.
you can download or copy data from the url -https://raw.githubusercontent.com/aviralb13/codes/main/datas/titanic_survivors.csv
importing the libraries
now we will import pandas and NumPy as given below if your system has not installed these libraries you can download by pip command.
import pandas as pd
import numpy as np
data preparation
now after copying data we will read our data with the help of the pandas head command shows the first 5 elements in the dataset and the shape command shows the number of rows and columns
url = ‘https://raw.githubusercontent.com/aviralb13/codes/main/datas/titanic_survivors.csv'
titanic = pd.read_csv(url)
titanic.head()
titanic.shape
now we will create 2 list one will be of no use in which the basic information is given i.e. name would not tell wheater a person will survive and in another list , we will be providing data that will tell wheater the person can survive or not which we are naming as important info
basic_info = [‘Name’,’Age’,’Sex’]
important_info =[‘Pclass’,’SibSp’,’Fare’]
x = titanic[important_info]
y = titanic[‘Survived’]
pclass is the passenger class
fare is the person has paid for the ticket
sibsp is telling us about their families (siblings and spouse)
(we can take more columns if we want ie age so it depends on you what you want to take )
these are the important information we are considering for the predictions and how data would be depending on them
if pclass is higher then the possibility of surviving is higher if the fare is higher then the possibility of surviving is higher if the number of siblings and spouse is higher then the possibility of surviving is lower
making the model
now we will import the logistic model from sklearn and train test split module
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
now we will split our data into 2 parts train and test, train dataset will train the model and the test will be calculating the model score we now can fit our train dataset into the model by the following commands.
train_x, val_x,train_y, val_y = train_test_split(x,y)
logistic_model = LogisticRegression()
logistic_model.fit(train_x,train_y)
predicting from the model
now we are predicting the first 5 elements possibility of surviving
1 means a person has survived
0 mean a person has not survived
and now we can check the actual result to check the model
logistic_model.predict(x[:5])
y.head()
as you can see our model has made only 1 prediction wrong rest of the 4 are correct so this is a good accuracy to know the score of our model we can use the .score() function to know the score
predictions = logistic_model.predict(val_x)
logistic_model.score(val_x,val_y)
as you can see our model is 65%accurate which is a good percentage in the next few articles, I will show you how to improve the score
you can go check the link for full code.
conclusion
in the article I have given you information and codes on how to make a simple logistic regression model with source code I would be making more exciting projects for you so stay connected