How to find duplicate values using pandas
In this tutorial, you will learn how to find duplicate values using pandas. Here you can find easily using in built function duplicated().
Syntax : DataFrame.duplicated(subset=None, keep=’first’)
Parameters:
subset : column label or sequence of labels, optional Only consider certain columns for identifying duplicates, by default use all of the columns
keep : {‘first’, ‘last’, False}, default ‘first’ first : Mark duplicates as True except for the first occurrence. last : Mark duplicates as True except for the last occurrence. False : Mark all duplicates as True.
Here you can check the duplicates values are present or not, If present, you can count how many duplicates are there. You can print them also.
#python pandas program to find duplicate values.
import pandas as pd
df1 = pd.DataFrame(['ravi','david','raju','david','test','check'])
df2 = pd.DataFrame(['1','2','3','4','5','2'])
# to test
print("Checking duplicate values")
print(df1.duplicated())
print(df2.duplicated())
print("Printing duplicate values")
print(df1[df1.duplicated(keep=False)])
print(df2[df2.duplicated(keep=False)])
Output:
Checking duplicate values 0 False 1 False 2 False 3 True 4 False 5 False dtype: bool 0 False 1 False 2 False 3 False 4 False 5 True dtype: bool Printing duplicate values 0 1 david 3 david 0 1 2 5 2