Most used functions in Pandas
Useful functions in Pandas for Data Science and Machine Learning
The most widely used Python library, Pandas, is used mostly for cleaning, preparing, analysing, and manipulating data. The pandas library may be used for many different things. It helps speed up and streamlines our job.
In this article, we will discussthe most popular and widely used pandas functions, which will definitely save us time and provide us with more insights into the dataset.
The first step is we need to download the dataset and then apply the dataset to the model. you can download or copy data from the URL —
git-codes/Health_insurance.csv at main · aviralb13/git-codes
You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…
also you can practice with your data if you’d like to
Importing the libraries
Now we will import pandas as shown below. If your system does not have these libraries installed, you may get them using the pip command.
To read the dataset in pandas library is easy and we have a variety of other options for reading datasets. we can use read_csv function which reads a CSV dataset. In addition, we have read_excel function, which lets us read an excel dataset.
Head and Tail
The head function is used to give a preview of first 5 lines of the dataset. We can also input the number of rows we want to see but by default it is set to 5. Similar to head function we also have tail function which shows last 5 lines of dataset. We can use data.head() or data .tail() to use these functions.
The info function is used to acquire basic information like columns, data type for each column and some other useful information. we can use data.info() inorder to use this function
We can use columns function to get all the names of the columns in the dataset. We can use data.columns to use this function
We have a datatype function in pandas which gives us information about the datatypes in dataset. we can use this function by data.dtypes to know all the types variables in the dataset.
Shape and size
Shape function return the amount of rows and columns. Whereas size function gives the total number of rows multiplied by the total number of columns in the data frame. we can use these commands by data.shape and data.size.
We have a describe function which gives statistical information about the dataset like count, mean, standard deviation, and data summary. We can use this function by data.describe()
Isna function is used to determine the total number of null or empty values in a dataset. we can use this function by data.isna(). By using the sum function we can get the total number of null values in each columns in the dataset.
Drop function is used to remove some unnecessary columns that we don’t want in our model. sometimes our dataset have many columns but some columns are useless. we can use data.drop(columns =’name of the column’)
Well, if you like this article you can check out my articles for more interesting articles in the field of artificial intelligence and machine learning.
If you like my article and efforts towards the community, you may support and encourage me, by simply buying coffee for me
If you found this article useful please appreciate it by giving claps and follow me for more interesting articles. Well, I have good news for you I would be bringing more articles to explain machine learning concepts and models with codes so leave a comment and tell me how excited are you about this.