Most used functions in Pandas

Useful functions in Pandas for Data Science and Machine Learning

Aviral Bhardwaj
4 min readJul 29, 2022

The most widely used Python library, Pandas, is used mostly for cleaning, preparing, analysing, and manipulating data. The pandas library may be used for many different things. It helps speed up and streamlines our job.

In this article, we will discussthe most popular and widely used pandas functions, which will definitely save us time and provide us with more insights into the dataset.

First step

The first step is we need to download the dataset and then apply the dataset to the model. you can download or copy data from the URL —

https://raw.githubusercontent.com/aviralb13/git-codes/main/datas/Health_insurance.csv

also you can practice with your data if you’d like to

Importing the libraries

Now we will import pandas as shown below. If your system does not have these libraries installed, you may get them using the pip command.

Read data

To read the dataset in pandas library is easy and we have a variety of other options for reading datasets. we can use read_csv function which reads a CSV dataset. In addition, we have read_excel function, which lets us read an excel dataset.

Head and Tail

The head function is used to give a preview of first 5 lines of the dataset. We can also input the number of rows we want to see but by default it is set to 5. Similar to head function we also have tail function which shows last 5 lines of dataset. We can use data.head() or data .tail() to use these functions.

Information

The info function is used to acquire basic information like columns, data type for each column and some other useful information. we can use data.info() inorder to use this function

Columns

We can use columns function to get all the names of the columns in the dataset. We can use data.columns to use this function

Datatypes

We have a datatype function in pandas which gives us information about the datatypes in dataset. we can use this function by data.dtypes to know all the types variables in the dataset.

Shape and size

Shape function return the amount of rows and columns. Whereas size function gives the total number of rows multiplied by the total number of columns in the data frame. we can use these commands by data.shape and data.size.

Describe

We have a describe function which gives statistical information about the dataset like count, mean, standard deviation, and data summary. We can use this function by data.describe()

Isna

Isna function is used to determine the total number of null or empty values in a dataset. we can use this function by data.isna(). By using the sum function we can get the total number of null values in each columns in the dataset.

Drop

Drop function is used to remove some unnecessary columns that we don’t want in our model. sometimes our dataset have many columns but some columns are useless. we can use data.drop(columns =’name of the column’)

Here we are dropping the columns age.

Well, if you like this article you can check out my articles for more interesting articles in the field of artificial intelligence and machine learning.

If you like my article and efforts towards the community, you may support and encourage me, by simply buying coffee for me

Conclusion

If you found this article useful please appreciate it by giving claps and follow me for more interesting articles. Well, I have good news for you I would be bringing more articles to explain machine learning concepts and models with codes so leave a comment and tell me how excited are you about this.

--

--

Aviral Bhardwaj

One of the youngest writer and mentor on AI-ML & Technology.