Python Pandas Dataframes | Create, Update, Delete, Sort, Merge Append ettc

Spread the love

Pandas dataframes:

Data frame is two dimensional array like table with column and rows.

  • It is widely used data structure in data science.
  • By using this we can perform arithmetic operations on columns and rows.
  • We can fill the missing values using mean, mode, median and more functions
  • We can modify, update, delete the data from dataframe

Table of Contents

How to Creation dataframes
How to Create Dataframe From numpy
How to Create Dataframe From Series
How to Create Dataframe From Dictionary
How to Access Data frames
How to add New Column to Data frames
How to add Select Column to Data frames By usin loc() Function
How to add Select Column to Data frames By usin iloc() Function
How to Drop a Column from Data frames
How to Drop a Row from Data frames

How to Create Dataframe

We can create dataframe by using lists, ndarrays, dictionaries and series.

Example Code:

#creating empty dataframe import pandas as pd d= pd.DataFrame() print(d)

Output:

Empty DataFrame

Columns: []

Index: []

How to create a dataframe by using Numpy Arrays:

We can create dataframes by using numpy arrays. Now we are creating random variables using random.randn function with 10 rows and 4 columns.

#examplecode import numpy as np import pandas as pd d= pd.DataFrame(np.random.randn(10, 4)) print(d) print("Dataframe with index", d.index) print("Dataframe with index", d.values)

#Output

0         1         2         3

0  0.575585 -0.583294 -1.928788  0.713230

1  2.008588  0.364259 -1.348116  1.456031

2  0.346671 -0.311684  1.511084  0.455040

3  0.899499 -0.686555  2.261072  0.607372

4 -0.077698  0.886232 -1.470222  0.322135

5  0.705476  0.392107 -0.026671 -1.354415

6 -2.172006  1.087336  1.195780 -0.968893

7 -1.529784  0.160476  0.360925 -0.082739

8  0.881206  1.014199  0.091057  0.983505

9  0.566864  1.067850 -0.177817  0.022582

Dataframe with index RangeIndex(start=0, stop=10, step=1)

Dataframe with index [[ 0.57558485 -0.58329432 -1.92878794  0.71323012]

[ 2.00858779  0.36425941 -1.34811578  1.45603071]

[ 0.34667058 -0.31168377  1.51108393  0.45504035]

[ 0.89949913 -0.68655523  2.26107211  0.60737179]

[-0.07769807  0.88623198 -1.47022159  0.32213545]

[ 0.70547551  0.392107   -0.026671   -1.3544151 ]

[-2.1720057   1.08733579  1.19578049 -0.96889338]

[-1.52978385  0.16047586  0.3609255  -0.08273863]

[ 0.88120555  1.01419862  0.09105704  0.98350461]

[ 0.56686351  1.0678498  -0.17781737  0.02258249]]

Creation of Dataframe From Dictionaries:

We can create Dataframes by using dictionaries. First we have to create a dictionary, then pass that dictionary as a parameter for new dataframe.  If we didn’t pass values with sufficient values for rows and colums it created null value automatically.

#examplecode

import numpy as np import pandas as pd dic=[{'a':1,'b':2,'c':3},{'a':1,'b':2}] d= pd.DataFrame(dic) print(d) print("Dataframe with index", d.index) print("Dataframe with index", d.values)

Output:

a  b    c

0  1  2  3.0

1  1  2  NaN

Dataframe with index RangeIndex(start=0, stop=2, step=1)

Dataframe with index [[ 1.  2.  3.]

[ 1.  2. nan]]

 

By passing index values :

import numpy as np import pandas as pd dic=[{'a':1,'b':2,'c':3},{'a':1,'b':2}] d= pd.DataFrame(dic, index=['Day1', 'Day2']) print(d) print("Dataframe with index", d.index) print("Dataframe with index", d.values)

Output:

a  b    c

Day1  1  2  3.0

Day2  1  2  NaN

Dataframe with index Index(['Day1', 'Day2'], dtype='object')

Dataframe with index [[ 1.  2.  3.]

[ 1.  2. nan]]

Creation of Dataframe From Series:

We can create dataframe by using panda series.  If you didn’t pass same values for two columns, it will automatically create null values.

#example code:

import numpy as np import pandas as pd series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d']) series2 = pd.Series([4, 2], index=['a', 'b']) d = pd.DataFrame({'one':series1, 'two':series2}) print(d)

Output:

    one  two

a    1  4.0

b    2  2.0

c    3  NaN

d    4  NaN

How to Access Columns Data in dataframe: 

We can access only column data by using below command in dataframe.

Syntax:  dataframe[‘columname’]

Example code:

import numpy as np import pandas as pd series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d']) series2 = pd.Series([4, 2], index=['a', 'b']) d = pd.DataFrame({'one':series1, 'two':series2}) print(d['one'])

Output:

a    1

b    2

c    3

d    4

Name: one, dtype: int64

How to add a new column to dataframe:

import numpy as np import pandas as pd series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d']) series2 = pd.Series([4, 2], index=['a', 'b']) d = pd.DataFrame({'series1':series1, 'series2':series2}) d['series3'] = pd.Series([5,6,7], index=['a','b','c']) print(d)

Output:

  series1  series2  series3

a        1      4.0      5.0

b        2      2.0      6.0

c        3      NaN      7.0

d        4      NaN      NaN

 

How to Select a Row in Dataframe:

We can select a row by using loc and iloc function in pandas.
Example code:

#by using loc function import numpy as np import pandas as pd series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d']) series2 = pd.Series([4, 2], index=['a', 'b']) d = pd.DataFrame({'series1':series1, 'series2':series2}) print(d.loc['d'])

Output:

series1    4.0

series2    NaN

Name: d, dtype: float64


#by using iloc function

import numpy as np import pandas as pd series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d']) series2 = pd.Series([4, 2], index=['a', 'b']) d = pd.DataFrame({'series1':series1, 'series2':series2}) print(d.iloc[0])

Output:

series1    1.0

series2    4.0

Name: a, dtype: float64

 

Slicing of Rows:

We can slice rows by using  : operator with iloc function.

import numpy as np import pandas as pd series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d']) series2 = pd.Series([4, 2], index=['a', 'b']) d = pd.DataFrame({'series1':series1, 'series2':series2}) print(d.iloc[0:2])

#output

 series1  series2

a        1      4.0

b        2      2.0

How to delete a column in dataframe:

We can delete a column using del or pop function.

Example code:

import numpy as np import pandas as pd series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d']) series2 = pd.Series([4, 2], index=['a', 'b']) d = pd.DataFrame({'series1':series1, 'series2':series2}) d['series3'] = pd.Series([5,6,7], index=['a','b','c']) d.pop('series3') del d['series1'] print(d)

Output:

series2

a      4.0

b      2.0

c      NaN

d      NaN

 

Removing a row dataframe:

We can remove a row in dataframe using drop function.

Syntax: Dataframe.drop(‘rowname’)

Example code:

import numpy as np import pandas as pd series1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c','d']) series2 = pd.Series([4, 2], index=['a', 'b']) d = pd.DataFrame({'series1':series1, 'series2':series2}) print(d) d= d.drop('a') print(d)

Output:

a        1      4.0

b        2      2.0

c        3      NaN

d        4      NaN

series1  series2

b        2      2.0

c        3      NaN

d        4      NaN

 

admin

Leave a Reply

Your email address will not be published. Required fields are marked *