Python pandas fillna and dropna function with examples [Complete Guide]
Python pandas has 2 inbuilt functions to deal with missing values in data. Those are fillna or dropna. We can replace the null by using mean or medium functions data. Or we will remove the data
Pandas Fillna function:
We will use fillna function by using pandas object to fill the null values in data.
Syntax:
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)
Fill NA/NaN values using the specified method.
Parameters:
value : scalar, dict, Series, or DataFrame
Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame will not be filled). This value cannot be a list.
method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use NEXT valid observation to fill gap
axis : {0 or ‘index’, 1 or ‘columns’}
inplace : boolean, default False
If True, fill in place. Note: this will modify any other views on this object, (e.g. a no-copy slice for a column in a DataFrame).
limit : int, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
downcast : dict, default is None
a dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible)
Returns:
filled : DataFrame
#DataFrame fillna program example
Example 1: In this example we are going to update the null value with zero
import pandas as pd
df = pd.DataFrame(data={'a':[1,2,3,None]})
print(df)
df.fillna(value=0, inplace=True)
print(df)
Output:
a 0 1.0 1 2.0 2 3.0 3 NaN a 0 1.0 1 2.0 2 3.0 3 0.0
Steps to handle missing data :
- First we need to import the data from csv
- We need to calculate the mean value of the the data
- We need to fill with null values with mean data
Example 2: With Multiple Values
In this example we have multiple columns with missing data. But we have to update requirement to update specific column then we will fillna with column names. We can limit this function by using ‘limit’.
import numpy as np
import pandas as pd
data = pd.read_csv('data.csv')
print(data)
mean_value= data["age"].mean()
data["age"]=data['age'].fillna(mean_value, limit=100)
print(data)
Output:
rollno age 0 101.0 12.0 1 102.0 15.0 2 103.0 NaN 3 104.0 17.0 4 105.0 22.0 5 106.0 26.0 6 107.0 NaN 7 108.0 17.0 8 NaN 18.0 9 110.0 20.0 rollno age 0 101.0 12.000 1 102.0 15.000 2 103.0 18.375 3 104.0 17.000 4 105.0 22.000 5 106.0 26.000 6 107.0 18.375 7 108.0 17.000 8 NaN 18.000 9 110.0 20.000
Example 3 : With Median Value
We can use to median function value to update the missing values.
import numpy as np
import pandas as pd
data = pd.read_csv('data.csv')
print(data)
mean_value= data["age"].median()
data["age"]=data['age'].fillna(mean_value, limit=100)
print(data)
Output:
rollno age 0 101.0 12.0 1 102.0 15.0 2 103.0 NaN 3 104.0 17.0 4 105.0 22.0 5 106.0 26.0 6 107.0 NaN 7 108.0 17.0 8 NaN 18.0 9 110.0 20.0 rollno age 0 101.0 12.0 1 102.0 15.0 2 103.0 17.5 3 104.0 17.0 4 105.0 22.0 5 106.0 26.0 6 107.0 17.5 7 108.0 17.0 8 NaN 18.0 9 110.0 20.0
Pandas Dropna function:
If you have null values in your dataset, we will use dropna to remove column wise and row wise and entire data set.
Syntax: DataFrame.dropna(axis=0, how=’any’, thresh=None, subset=None, inplace=False)
Remove missing values.
See the User Guide for more on which values are considered missing, and how to work with missing data.
Parameters:
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Determine if rows or columns which contain missing values are removed.
0, or ‘index’ : Drop rows which contain missing values.
1, or ‘columns’ : Drop columns which contain missing value.
Deprecated since version 0.23.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.
how : {‘any’, ‘all’}, default ‘any’
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
‘any’ : If any NA values are present, drop that row or column.
‘all’ : If all values are NA, drop that row or column.
thresh : int, optional
Require that many non-NA values.
subset : array-like, optional
Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
inplace : bool, default False
If True, do operation inplace and return None.
Returns:
DataFrame
DataFrame with NA entries dropped from it.
Pandas Dropna Example:
import pandas as pd
import numpy as np
df = pd.DataFrame({"name": ['ali', 'raj',np.nan], "age": [np.nan, 27, 28], "born": [pd.NaT, pd.Timestamp("1940-04-25"), pd.NaT]})
print("Before removing Null Vales\n",df)
print("After removing Null Vales\n",df.dropna())
Output:
Before removing Null Vales age born name 0 NaN NaT ali 1 27.0 1940-04-25 raj 2 28.0 NaT NaN After removing Null Vales age born name 1 27.0 1940-04-25 raj