How to create dummy variables using pandas in python ?
In this tutorial, you will learn how to create dummy variables using pandas in python ?
We will use get_dummies() function in pandas to generate dummy variables. First we need to import pandas library and then we need to pass data as arrays, series or data-frames.
Syntax: pandas.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)[source]
Parameters:
data : array-like, Series, or DataFrame
prefix : string, list of strings, or dict of strings, default None
String to append DataFrame column names. Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Alternatively, prefix can be a dictionary mapping column names to prefixes.
prefix_sep : string, default ‘_’
If appending prefix, separator/delimiter to use. Or pass a list or dictionary as with prefix.
dummy_na : bool, default False
Add a column to indicate NaNs, if False NaNs are ignored.
columns : list-like, default None
Column names in the DataFrame to be encoded. If columns is None then all the columns with object or category dtype will be converted.
sparse : bool, default False
Whether the dummy-encoded columns should be be backed by a SparseArray (True) or a regular NumPy array (False).
drop_first : bool, default False
Whether to get k-1 dummies out of k categorical levels by removing the first level.
Returns: dummies : DataFrame
Here is example program on pandas dummy variables.
import pandas as pd
# by using arrays
a = ['a','b','c','d']
print(pd.get_dummies(a))
# by using series
b = pd.Series(list('pandas'))
print(pd.get_dummies(b))
# by using dataframe
c = pd.DataFrame({'A': ['a', 'b', 'c','d','e']})
print(pd.get_dummies(c))
Output:
a b c d 0 1 0 0 0 1 0 1 0 0 2 0 0 1 0 3 0 0 0 1 a d n p s 0 0 0 0 1 0 1 1 0 0 0 0 2 0 0 1 0 0 3 0 1 0 0 0 4 1 0 0 0 0 5 0 0 0 0 1 A_a A_b A_c A_d A_e 0 1 0 0 0 0 1 0 1 0 0 0 2 0 0 1 0 0 3 0 0 0 1 0 4 0 0 0 0 1