How to find duplicate values using pandas

Spread the love

In this tutorial, you will learn how to find duplicate values using pandas. Here you can find easily using in built function duplicated().

 

Syntax : DataFrame.duplicated(subset=None, keep=’first’)

Parameters:

subset : column label or sequence of labels, optional Only consider certain columns for identifying duplicates, by default use all of the columns

keep : {‘first’, ‘last’, False}, default ‘first’ first : Mark duplicates as True except for the first occurrence. last : Mark duplicates as True except for the last occurrence. False : Mark all duplicates as True.

Here you can check the duplicates values are present or not, If present, you can  count  how many duplicates are there. You can print them also.

#python pandas program to find duplicate values.

import pandas as pd df1 = pd.DataFrame(['ravi','david','raju','david','test','check']) df2 = pd.DataFrame(['1','2','3','4','5','2']) # to test print("Checking duplicate values") print(df1.duplicated()) print(df2.duplicated()) print("Printing duplicate values") print(df1[df1.duplicated(keep=False)]) print(df2[df2.duplicated(keep=False)])

Output:

Checking duplicate values
0 False
1 False
2 False
3 True
4 False
5 False
dtype: bool
0 False
1 False
2 False
3 False
4 False
5 True
dtype: bool
Printing duplicate values
0
1 david
3 david
0
1 2
5 2

 

admin

Leave a Reply

Your email address will not be published. Required fields are marked *