Lung cancer prediction model — Support Vector Machine

Using the support vector machine (SVM) model we will make a model that will predict that a person has lung cancer or not based on their health data (with source code)

4 min readNov 3, 2022

As in one of my previous article, I have given you an introduction to Support vector machine (SVM). Now, in this article, I’ll demonstrate how to build a SVM model using a few lines of code.

Click on this link to learn more about the Support Vector Machine model.

What is support vector machine (SVM) ?

An introduction to machine learning algorithms

iaviral.medium.com

So let’s start

The first step is we need to download the dataset and then apply the dataset to the model. You can download or copy data from the URL —

https://raw.githubusercontent.com/aviralb13/git-codes/main/datas/lung%20cancer.csv

git-codes/lung cancer.csv at main · aviralb13/git-codes

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below…

github.com

Importing the libraries

Now we will import pandas and NumPy as shown below. If your system does not have these libraries installed, you may get them using the pip command.

Data preparation

Now we’ll read the data using Pandas and save it in a variable called data so we don’t have to call it again and again. Using the head command, we can view the first 5 components of the data; if you want to see more, enter the number inside the bracket.

Since some data columns are words or strings, we must first transform them into integers. To do this, we will use one hot encoding and label encoding approach.

Defining X and Y

And now I have created a list in which I think it will be the deciding factor for the dignosis of a lung cancer i.e (AGE, SMOKING, YELLOW_FINGERS, ANXIETY, PEER_PRESSURE, CHRONIC DISEASE, FATIGUE and etc.) and assign it to variable feature. Now I will pass these features in the dataset and store it as x and the result of diagnosis as y.

I believe that these x parameters are more appropriate, and if you want to modify the parameters because you believe they are relevant, you can do so.

Splitting the Dataset

We must first import the test train split from sklearn model selection before splitting our model dataset into a train and test dataset.

Making the model

Model

Here, we will be making a Support Vector Machine because we have to classify our values for making that we have to import the SVC from the sklearn.svm and then fit our train data into the model to train the model.

And finally, our model is ready now we are all set to predict from our model.

Accuracy

Now, we will see how to get our model accuracy here our model is 93% accurate which means it has guessed 93 values correct out of 100 which is a very good accuracy.

Source code

You can go check the link for full code.

git-codes/lungcancer.ipynb at main · aviralb13/git-codes

Contribute to aviralb13/git-codes development by creating an account on GitHub.