How to remove duplicate values using pandas
In This tutorial, You will learn How to remove duplicate values using pandas with inbuilt function that is ‘drop_duplicates’
Syntax : DataFrame.drop_duplicates(subset=None, keep=’first’, inplace=False)
Parameters :
subset : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns
keep : {‘first’, ‘last’, False}, default ‘first’
first : Drop duplicates except for the first occurrence.
last : Drop duplicates except for the last occurrence.
False : Drop all duplicates.
inplace : boolean, default False
import pandas as pd
df1 = pd.DataFrame(['ravi','david','raju','david','test','check'])
df2 = pd.DataFrame(['1','2','3','4','5','2'])
print("Printing duplicate values")
print(df1[df1.duplicated(keep=False)])
print(df2[df2.duplicated(keep=False)])
df1 = df1.drop_duplicates()
df2 = df2.drop_duplicates()
print(df1)
print(df2)
Output:
Printing duplicate values 0 1 david 3 david 0 1 2 5 2 0 0 ravi 1 david 2 raju 4 test 5 check 0 0 1 1 2 2 3 3 4 4 5