# Pandas standard deviation [Complete Guide] dataframes, series groupby with examples

In this tutorial, You will learn how to write a program to calculate standard deviation in pandas.

Pandas has a inbuilt function std() , we can use that. You can calculate for standard deviation for entire data and single column also.

**Standard Deviation on Dataframes:**

**Syntax:** DataFrame.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

**Parameters:**

**axis** : {index (0), columns (1)}

**skipna** : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

**level** : int or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series

**ddof** : int, default 1

Delta Degrees of Freedom. The divisor used in calculations is N – ddof, where N represents the number of elements.

**numeric_only** : boolean, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

` `

```
#pandas standard deviation example
import pandas as pd
data = pd.DataFrame({ 'name':['ravi','david','raju','david','kumar','teju'],
'experience':[1,2,3,4,5,2],
'salary':[15000,20000,30000,45389,50000,20000],
'join_year' :[2017,2017,2018,2018,2019,2018] })
#To calculate standard deviation
print(data.std())
#to calculate standard deviation for specific column
print(data['salary'].std())
```

` `

` `

**Output:**

experience 1.471960 join_year 0.752773 salary 14572.550229 dtype: float64 14572.550228654787

**Standard Deviation on Series:**

**Syntax:** pandas.Series.std

Series.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)[source]¶

Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

**Parameters:**

axis : {index (0)}

skipna : boolean, default True

Exclude NA/null values. If an entire row/column is NA, the result will be NA

**level :** int or level name, default None

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a scalar

**ddof :** int, default 1

Delta Degrees of Freedom. The divisor used in calculations is N – ddof, where N represents the number of elements.

**numeric_only :** boolean, default None

Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

**Returns:**

std : scalar or Series (if level specified)

` `

```
import pandas as pd
d= pd.Series([1,2,3,6])
#To calculate standard deviation on series
print(d.std())
```

` `

` `

**Rolling standard deviation:**

Here you will know, how to calculate rolling standard deviation.

**Syntax:** pandas.rolling_std(arg, window, min_periods=None, freq=None, center=False, how=None, **kwargs)

**Parameters:**

**arg :** Series, DataFrame

**window :** int

Size of the moving window. This is the number of observations used for calculating the statistic.

**min_periods :** int, default None

Minimum number of observations in window required to have a value (otherwise result is NA).

**freq :** string or DateOffset object, optional (default None)

Frequency to conform the data to before computing the statistic. Specified as a frequency string or DateOffset object.

**center :** boolean, default False

Set the labels at the center of the window.

**how :** string, default ‘None’

Method for down- or re-sampling

**ddof :** int, default 1

Delta Degrees of Freedom. The divisor used in calculations is N – ddof, where N represents the number of elements.

**Returns:**

y : type of input argument

**Notes**

By default, the result is set to the right edge of the window. This can be changed to the center of the window by setting center=True.

The freq keyword is used to conform time series data to a specified frequency by resampling the data. This is done with the default parameters of resample() (i.e. using the mean).

` `

```
import pandas as pd
d= pd.Series([1,5,8,4,15,6,37,8,49])
#To calculate rolling standard deviation
print(pd.rolling_std(d,2))
```

` `

` `

### Unbiased standard deviation:

you can calculate unbiased standard deviation use df.sem() function.

pandas.DataFrame.sem():Return unbiased standard error of the mean over requested axis.

**Syntax:** DataFrame.sem(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

` `

```
import pandas as pd
d= pd.Series([1,5,8,4,15,6,37,8,49])
#To calculate standard deviation
print(d.sem())
```

` `

` `

**Output:**

5.57219729694

**pandas standard deviation groupby:**

We can calculate standard deviation by using GroupBy.std function.

` `

```
import pandas as pd
df=pd.DataFrame({'A':[3,4,3,4],'B':[4,3,3,4],'C':[1,2,2,1]})
#To calculate standard deviation by groupby
print(df.groupby(['A']).std())
```

` `

` `

**Output:**

B C A 1 0.707107 0.707107 2 0.707107 0.707107