Pandas standard deviation [Complete Guide] dataframes, series groupby with examples

Spread the love

In this tutorial, You will learn how to write a program to calculate standard deviation in pandas.

Pandas has a inbuilt function std() , we can use that. You can calculate for standard deviation for entire data and single column also.

Standard Deviation on Dataframes:

Syntax: DataFrame.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

Parameters:
axis : {index (0), columns (1)}
skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

level : int or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series

ddof : int, default 1

Delta Degrees of Freedom. The divisor used in calculations is N – ddof, where N represents the number of elements.

numeric_only : boolean, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

#pandas standard deviation example import pandas as pd data = pd.DataFrame({ 'name':['ravi','david','raju','david','kumar','teju'], 'experience':[1,2,3,4,5,2], 'salary':[15000,20000,30000,45389,50000,20000], 'join_year' :[2017,2017,2018,2018,2019,2018] }) #To calculate standard deviation print(data.std()) #to calculate standard deviation for specific column print(data['salary'].std())

Output:

experience        1.471960
join_year         0.752773
salary        14572.550229
dtype: float64
14572.550228654787

Standard Deviation on Series:

Syntax: pandas.Series.std
Series.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)[source]¶
Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters:
axis : {index (0)}
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA

level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a scalar

ddof : int, default 1
Delta Degrees of Freedom. The divisor used in calculations is N – ddof, where N represents the number of elements.

numeric_only : boolean, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Returns:
std : scalar or Series (if level specified)

import pandas as pd d= pd.Series([1,2,3,6]) #To calculate standard deviation on series print(d.std())

Rolling standard deviation:

Here you will know, how to calculate rolling standard deviation.

Syntax: pandas.rolling_std(arg, window, min_periods=None, freq=None, center=False, how=None, **kwargs)

Parameters:
arg : Series, DataFrame

window : int

Size of the moving window. This is the number of observations used for calculating the statistic.

min_periods : int, default None

Minimum number of observations in window required to have a value (otherwise result is NA).

freq : string or DateOffset object, optional (default None)

Frequency to conform the data to before computing the statistic. Specified as a frequency string or DateOffset object.

center : boolean, default False

Set the labels at the center of the window.

how : string, default ‘None’

Method for down- or re-sampling

ddof : int, default 1

Delta Degrees of Freedom. The divisor used in calculations is N – ddof, where N represents the number of elements.

Returns:
y : type of input argument

Notes

By default, the result is set to the right edge of the window. This can be changed to the center of the window by setting center=True.

The freq keyword is used to conform time series data to a specified frequency by resampling the data. This is done with the default parameters of resample() (i.e. using the mean).

import pandas as pd d= pd.Series([1,5,8,4,15,6,37,8,49]) #To calculate rolling standard deviation print(pd.rolling_std(d,2))

Unbiased standard deviation:

you can calculate unbiased standard deviation use df.sem() function.

pandas.DataFrame.sem():Return unbiased standard error of the mean over requested axis.

Syntax: DataFrame.sem(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

import pandas as pd d= pd.Series([1,5,8,4,15,6,37,8,49]) #To calculate standard deviation print(d.sem())

Output:

5.57219729694

pandas standard deviation groupby:

We can calculate standard deviation by using GroupBy.std function.

import pandas as pd df=pd.DataFrame({'A':[3,4,3,4],'B':[4,3,3,4],'C':[1,2,2,1]}) #To calculate standard deviation by groupby print(df.groupby(['A']).std())

Output:

          B         C
A                    
1  0.707107  0.707107
2  0.707107  0.707107

 

admin

admin

Leave a Reply

Your email address will not be published. Required fields are marked *