Python Pandas Dataframes | Create, Update, Delete, Sort, Merge Append ettc
Pandas dataframes:
Data frame is two dimensional array like table with column and rows.
- It is widely used data structure in data science.
- By using this we can perform arithmetic operations on columns and rows.
- We can fill the missing values using mean, mode, median and more functions
- We can modify, update, delete the data from dataframe
Table of Contents
How to Creation dataframes
How to Create Dataframe From numpy
How to Create Dataframe From Series
How to Create Dataframe From Dictionary
How to Access Data frames
How to add New Column to Data frames
How to add Select Column to Data frames By usin loc() Function
How to add Select Column to Data frames By usin iloc() Function
How to Drop a Column from Data frames
How to Drop a Row from Data frames
How to Create Dataframe
We can create dataframe by using lists, ndarrays, dictionaries and series.
Example Code:
#creating empty dataframe
import pandas as pd
d= pd.DataFrame()
print(d)
Output:
Empty DataFrame Columns: [] Index: []
How to create a dataframe by using Numpy Arrays:
We can create dataframes by using numpy arrays. Now we are creating random variables using random.randn function with 10 rows and 4 columns.
#examplecode
import numpy as np
import pandas as pd
d= pd.DataFrame(np.random.randn(10, 4))
print(d)
print("Dataframe with index", d.index)
print("Dataframe with index", d.values)
#Output
0 1 2 3 0 0.575585 -0.583294 -1.928788 0.713230 1 2.008588 0.364259 -1.348116 1.456031 2 0.346671 -0.311684 1.511084 0.455040 3 0.899499 -0.686555 2.261072 0.607372 4 -0.077698 0.886232 -1.470222 0.322135 5 0.705476 0.392107 -0.026671 -1.354415 6 -2.172006 1.087336 1.195780 -0.968893 7 -1.529784 0.160476 0.360925 -0.082739 8 0.881206 1.014199 0.091057 0.983505 9 0.566864 1.067850 -0.177817 0.022582 Dataframe with index RangeIndex(start=0, stop=10, step=1) Dataframe with index [[ 0.57558485 -0.58329432 -1.92878794 0.71323012] [ 2.00858779 0.36425941 -1.34811578 1.45603071] [ 0.34667058 -0.31168377 1.51108393 0.45504035] [ 0.89949913 -0.68655523 2.26107211 0.60737179] [-0.07769807 0.88623198 -1.47022159 0.32213545] [ 0.70547551 0.392107 -0.026671 -1.3544151 ] [-2.1720057 1.08733579 1.19578049 -0.96889338] [-1.52978385 0.16047586 0.3609255 -0.08273863] [ 0.88120555 1.01419862 0.09105704 0.98350461] [ 0.56686351 1.0678498 -0.17781737 0.02258249]]
Creation of Dataframe From Dictionaries:
We can create Dataframes by using dictionaries. First we have to create a dictionary, then pass that dictionary as a parameter for new dataframe. If we didn’t pass values with sufficient values for rows and colums it created null value automatically.
#examplecode
import numpy as np
import pandas as pd
dic=[{'a':1,'b':2,'c':3},{'a':1,'b':2}]
d= pd.DataFrame(dic)
print(d)
print("Dataframe with index", d.index)
print("Dataframe with index", d.values)
Output:
a b c 0 1 2 3.0 1 1 2 NaN Dataframe with index RangeIndex(start=0, stop=2, step=1) Dataframe with index [[ 1. 2. 3.] [ 1. 2. nan]]
By passing index values :
import numpy as np
import pandas as pd
dic=[{'a':1,'b':2,'c':3},{'a':1,'b':2}]
d= pd.DataFrame(dic, index=['Day1', 'Day2'])
print(d)
print("Dataframe with index", d.index)
print("Dataframe with index", d.values)
Output:
a b c Day1 1 2 3.0 Day2 1 2 NaN Dataframe with index Index(['Day1', 'Day2'], dtype='object') Dataframe with index [[ 1. 2. 3.] [ 1. 2. nan]]
Creation of Dataframe From Series:
We can create dataframe by using panda series. If you didn’t pass same values for two columns, it will automatically create null values.
#example code:
import numpy as np
import pandas as pd
series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d'])
series2 = pd.Series([4, 2], index=['a', 'b'])
d = pd.DataFrame({'one':series1, 'two':series2})
print(d)
Output:
one two a 1 4.0 b 2 2.0 c 3 NaN d 4 NaN
How to Access Columns Data in dataframe:
We can access only column data by using below command in dataframe.
Syntax: dataframe[‘columname’]
Example code:
import numpy as np
import pandas as pd
series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d'])
series2 = pd.Series([4, 2], index=['a', 'b'])
d = pd.DataFrame({'one':series1, 'two':series2})
print(d['one'])
Output:
a 1 b 2 c 3 d 4 Name: one, dtype: int64
How to add a new column to dataframe:
import numpy as np
import pandas as pd
series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d'])
series2 = pd.Series([4, 2], index=['a', 'b'])
d = pd.DataFrame({'series1':series1, 'series2':series2})
d['series3'] = pd.Series([5,6,7], index=['a','b','c'])
print(d)
Output:
series1 series2 series3 a 1 4.0 5.0 b 2 2.0 6.0 c 3 NaN 7.0 d 4 NaN NaN
How to Select a Row in Dataframe:
We can select a row by using loc and iloc function in pandas.
Example code:
#by using loc function
import numpy as np
import pandas as pd
series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d'])
series2 = pd.Series([4, 2], index=['a', 'b'])
d = pd.DataFrame({'series1':series1, 'series2':series2})
print(d.loc['d'])
Output:
series1 4.0 series2 NaN Name: d, dtype: float64
import numpy as np
import pandas as pd
series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d'])
series2 = pd.Series([4, 2], index=['a', 'b'])
d = pd.DataFrame({'series1':series1, 'series2':series2})
print(d.iloc[0])
Output:
series1 1.0 series2 4.0 Name: a, dtype: float64
Slicing of Rows:
We can slice rows by using : operator with iloc function.
import numpy as np
import pandas as pd
series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d'])
series2 = pd.Series([4, 2], index=['a', 'b'])
d = pd.DataFrame({'series1':series1, 'series2':series2})
print(d.iloc[0:2])
#output
series1 series2 a 1 4.0 b 2 2.0
How to delete a column in dataframe:
We can delete a column using del or pop function.
Example code:
import numpy as np
import pandas as pd
series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d'])
series2 = pd.Series([4, 2], index=['a', 'b'])
d = pd.DataFrame({'series1':series1, 'series2':series2})
d['series3'] = pd.Series([5,6,7], index=['a','b','c'])
d.pop('series3')
del d['series1']
print(d)
Output:
series2 a 4.0 b 2.0 c NaN d NaN
Removing a row dataframe:
We can remove a row in dataframe using drop function.
Syntax: Dataframe.drop(‘rowname’)
Example code:
import numpy as np
import pandas as pd
series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d'])
series2 = pd.Series([4, 2], index=['a', 'b'])
d = pd.DataFrame({'series1':series1, 'series2':series2})
print(d)
d= d.drop('a')
print(d)
Output:
a 1 4.0 b 2 2.0 c 3 NaN d 4 NaN series1 series2 b 2 2.0 c 3 NaN d 4 NaN