Pandas standard deviation [Complete Guide] dataframes, series groupby with examples

In this tutorial, You will learn how to write a program to calculate standard deviation in pandas.

Pandas has a inbuilt function std() , we can use that. You can calculate for standard deviation for entire data and single column also.

Standard Deviation on Dataframes:

Syntax: DataFrame.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

axis : {index (0), columns (1)}
skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

level : int or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series

ddof : int, default 1

Delta Degrees of Freedom. The divisor used in calculations is N – ddof, where N represents the number of elements.

numeric_only : boolean, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

#pandas standard deviation example import pandas as pd data = pd.DataFrame({ 'name':['ravi','david','raju','david','kumar','teju'], 'experience':[1,2,3,4,5,2], 'salary':[15000,20000,30000,45389,50000,20000], 'join_year' :[2017,2017,2018,2018,2019,2018] }) #To calculate standard deviation print(data.std()) #to calculate standard deviation for specific column print(data['salary'].std())


experience        1.471960
join_year         0.752773
salary        14572.550229
dtype: float64

Standard Deviation on Series:

Syntax: pandas.Series.std
Series.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)[source]¶
Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

axis : {index (0)}
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA

level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a scalar

ddof : int, default 1
Delta Degrees of Freedom. The divisor used in calculations is N – ddof, where N represents the number of elements.

numeric_only : boolean, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

std : scalar or Series (if level specified)

import pandas as pd d= pd.Series([1,2,3,6]) #To calculate standard deviation on series print(d.std())

Rolling standard deviation:

Here you will know, how to calculate rolling standard deviation.

Syntax: pandas.rolling_std(arg, window, min_periods=None, freq=None, center=False, how=None, **kwargs)

arg : Series, DataFrame

window : int

Size of the moving window. This is the number of observations used for calculating the statistic.

min_periods : int, default None

Minimum number of observations in window required to have a value (otherwise result is NA).

freq : string or DateOffset object, optional (default None)

Frequency to conform the data to before computing the statistic. Specified as a frequency string or DateOffset object.

center : boolean, default False

Set the labels at the center of the window.

how : string, default ‘None’

Method for down- or re-sampling

ddof : int, default 1

Delta Degrees of Freedom. The divisor used in calculations is N – ddof, where N represents the number of elements.

y : type of input argument


By default, the result is set to the right edge of the window. This can be changed to the center of the window by setting center=True.

The freq keyword is used to conform time series data to a specified frequency by resampling the data. This is done with the default parameters of resample() (i.e. using the mean).

import pandas as pd d= pd.Series([1,5,8,4,15,6,37,8,49]) #To calculate rolling standard deviation print(pd.rolling_std(d,2))

Unbiased standard deviation:

you can calculate unbiased standard deviation use df.sem() function.

pandas.DataFrame.sem():Return unbiased standard error of the mean over requested axis.

Syntax: DataFrame.sem(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

import pandas as pd d= pd.Series([1,5,8,4,15,6,37,8,49]) #To calculate standard deviation print(d.sem())



pandas standard deviation groupby:

We can calculate standard deviation by using GroupBy.std function.

import pandas as pd df=pd.DataFrame({'A':[3,4,3,4],'B':[4,3,3,4],'C':[1,2,2,1]}) #To calculate standard deviation by groupby print(df.groupby(['A']).std())


          B         C
1  0.707107  0.707107
2  0.707107  0.707107



