Operations On Dataframe - Part One

Aashina Arora
4y
9.4k
0
4

Article

So far, we have learned many concepts in Pandas,

Now we will learn about various operations that we can perform in regards to dataFrames,

All these categories mentioned above perform various operations that could be helpful in different kinds of data analysis. So, let’s study them in depth now,

Binary Operations

Binary means ‘two’, if we perform any operation between two elements then it is a Binary Operation. This includes addition, subtraction, multiplication and division. Since we are considering dataFrames here, so their operations are in regard to two dataFrames, like adding, subtracting or multiplying elements of two dataFrames.

+ , add(), radd()

If 2 dataFrames are all numeric and we want to add those 2 dataFrames, then we use ‘+’.

SYNTAX
dataFrame1-dataFrame2

For the addition of 2 dataFrames we can also use the method ‘add()’.

SYNTAX
dataFrame1.add(dataFrame2)

Also, you can use ‘radd()’, this works the same as add(), the difference is that if we want A+B, we use add(), else if we want B+A, we use radd(). (It won’t make any difference in addition but it would make sense when we need subtraction and division.)

SYNTAX
dataFrame1.radd(dataFrame2)
1. import pandas as pd
3. dict1= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }
5. df1=pd.DataFrame(dict1,index=['0','1','2'])
6. print("This is df1:")
7. print(df1)
8. print('\n')
10. dict2= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], D': [95,87,92] }
12. df2=pd.DataFrame(dict2,index=['0','1','2'])
13. print("This is df2:")
14. print(df2)
15. print('\n')
17. df3=df1+df2
18. print("Using '+', This is df1+df2 :")
19. print("This is df3:")
20. print(df3)
21. print('\n')
23. df4=df2.add(df3)
24. print("Using 'add()', This is df2+df3 :")
25. print("This is df4:")
26. print(df4)
27. print('\n')
29. df5=df3.radd(df4)
30. print("Using 'radd()', This is df4+df3:")
31. print(df5)

OUTPUT

- , sub(),rsub()

If you want to perform subtraction within 2 dataFrames then you can use ‘-‘ or method ‘sub()’.

SYNTAX
dataFrame1-dataFrame2

SYNTAX
dataFrame1.sub(dataFrame2)

As mentioned above, if you want A-B, then use ‘sub()’, but if you want B-A, then use ‘rsub()’

SYNTAX
dataFrame1.rsub(dataFrame2)
For B-A, you can also use,

SYNTAX
dataFrame2-dataFrame1
1. import pandas as pd
3. dict1= {'A':[85,73,98], 'B':[60,80,58],'C':[90,60,74], 'D': [95,87,92] }
5. df1=pd.DataFrame(dict1,index=['0','1','2'])
6. print("This is df1:")
7. print(df1)
8. print('\n')
10. dict2= {'A':[8,7,9], 'B':[6,8,5], 'C':[9,6,7], 'D': [5,8,2] }
12. df2=pd.DataFrame(dict2,index=['0','1','2'])
14. print("This is df2:")
15. print(df2)
16. print('\n')
18. df3=df1.sub(df2)
19. print("Using 'sub()', This is df1-df2 :")
20. print(df3)
21. print('\n')
23. df4=df1.rsub(df2)
24. print("Using 'rsub()', This is df2-df1 :")
25. print(df4)

OUTPUT

* , mul(), rmul()

If you want to multiply 2 dataFrames then you can use ‘*‘ or method ‘mul()’.

SYNTAX

dataFrame1*dataFrame2

SYNTAX

dataFrame1.mul(dataFrame2)

‘rmul()’ works same as radd()

SYNTAX

dataFrame1.rmul(dataFrame2)

import pandas as pd
dict1= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], D': [95,87,92] }
df1=pd.DataFrame(dict1,index=['0','1','2'])
print("This is df1:")
print(df1)
print('\n')
dict2= {'A':[8,7,9], 'B':[6,8,5], 'C':[9,6,7], 'D': [5,8,2] }
df2=pd.DataFrame(dict2,index=['0','1','2'])
print("This is df2:")
print(df2)
print('\n')
df3=df1*df2
print("Using '*', This is df1*df2 :")
print("This is df3:")
print(df3)
print('\n')
df4=df2.mul(df3)
print("Using 'mul()', This is df2*df3 :")
print(df4)

OUTPUT

/ , div(), rdiv()

If you want to perform division within 2 dataFrames then you can use ‘/‘ or method ‘div()’.

SYNTAX
dataFrame1/dataFrame2

SYNTAX
dataFrame1.div(dataFrame2)

As mentioned above, if you want A/B, then use ‘div()’, but if you want B-A, then use ‘rdiv()’

SYNTAX
dataFrame1.rdiv(dataFrame2)

For B/A, you can also use,

SYNTAX
dataFrame2/dataFrame1
1. import pandas as pd
3. dict1= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }
5. df1=pd.DataFrame(dict1,index=['0','1','2'])
6. print("This is df1:")
7. print(df1)
8. print('\n')
10. dict2= {'A':[8,7,9], 'B':[6,8,5], 'C':[9,6,7], 'D': [5,8,2] }
12. df2=pd.DataFrame(dict2,index=['0','1','2'])
13. print("This is df2:")
14. print(df2)
15. print('\n')
17. df3=df1/df2
18. print("Using '/', This is df1/df2 :")
19. print("This is df3:")
20. print(df3)
21. print('\n')
23. df4=df3.div(df2)
24. print("Using 'div()', This is df3/df2 :")
25. print(df4)
26. print('\n')
28. print("Using 'rdiv()', This is df2/df1 :")
29. df5=df1.rdiv(df2)
30. print(df5)

OUTPUT

Inspection Functions

As the name suggests, these functions are used to inspect or you can say examine a dataframe. To gather information or to know the detailed description of a dataframe these inspection functions are used.

These are to 2 kinds,

info()
describe()

Let us understand them briefly,

info()

If you want to gather any information about a particular dataFrame like how many rows are there, how many columns, what is its data type, how much memory it uses, etc., then we use method ‘info()’

info() method gives you an output in 7 parts,

Type – Gives data type of the object, which is of given dataFrame
No. of rows- Prints no. of rows and row names
No. of columns- Prints no. of columns and column names
Description of all the columns
Data Type- Displays data type of each column if it differs
Memory Usage
Null Count
1. import pandas as pd
3. dict= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }
5. df=pd.DataFrame(dict,index=['0','1','2'])
6. print(df)
7. print("\n")
8. print(df.info()

OUTPUT

describe()

If you want the description of a particular dataFrame, as in statistical information like mean, stand deviation, count of non-NA values, etc. then use method ‘describe()’.

describe() method gives you an output in 8 parts,

Count of non-NA values in each column
Mean of each column
Standard Deviation of each column
Minimum values in each column
25% of each column
25% of each column
25% of each column
Maximum values in each column
1. import pandas as pd
3. dict= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }
5. df=pd.DataFrame(dict,index=['0','1','2'])
6. print(df)
7. print("\n")
8. print(df.describe())

OUTPUT

3. Retrieve Head and Tail Rows

If you want to display top 5 rows of a dataFrame, then use- ‘head()’.
If you want to display bottom 5 rows of a dataFrame, then use- ‘tail()’.
If you want to display top 7 rows of a dataFrame, then use- ‘head(7)’
Default value of head() and tail() methods is: 5.

SYNTAX

dataFrame.head()
dataFrame.tail()
dataFrame.head(7)

import pandas as pd



dict= {'A':[85,73,98,59,27,78,99,36,58,24,25,32],

'B':[60,80,58,78,52,54,89,63,54,87,52,65],

'C':[90,60,74,69,98,74,23,65,45,78,98,98],

'D':[55,27,92,56,78,88,78,89,23,45,54,34],

'E':[91,12,98,63,98,97,45,96,91,32,65,76]

}



df=pd.DataFrame(dict,index=['0','1','2','3','4','5','6','7','8','9','10','11'])

print(df)

print("\n")



print("Using head():","\n",df.head())

print("\n")

print("Using tail():","\n",df.tail())

print("\n")

print("Top 7 rows:","\n",df.head(7))

OUTPUT

4. Iteration

It could be a scenario sometimes that you want to see each item of rows or columns separately. In these kinds of scenarios, we use iteration.

If you want to separate all the rows or you want to see items a every row separately then use ‘iterrows()’.
iterrows() would iterate the dataFrame row-wise.
Here each horizonalsubset is in the form- (row_index, columnNames_and_values)
1. import pandas as pd
3. dict= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }
5. df=pd.DataFrame(dict,index=['0','1','2'])
6. print(df)
7. print("\n")
9. for row,row_series in df.iterrows():
10. print("Row Index:",row)
11. print("Columns Names and Values:","\n",row_series,"\n")

OUTPUT

If you want to separate all the columns or you want to see items of every column separately then use ‘iteritems()’.
iteritems() would iterate the dataFrame column-wise.
Here each verticalsubset is in the form- (column_index, rowNames_and_values)
1. import pandas as pd
3. dict= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }
5. df=pd.DataFrame(dict,index=['0','1','2'])
6. print(df)
7. print("\n")
9. for col,col_series in df.iteritems():
10. print("Column Index:",col)
11. print("\n")
12. i=0
13. for val in col_series:
14. print("At Row",i,":",val)
15. i=i+1

OUTPUT

SUMMARY

In this article, we covered a few operations- Binary Operations, Inspection Functions, Retrieve Head and Tail Rows and Iteration; Hence Operations on DataFrame-Part1

My next article will be Part 2 of the same topic and we will continue with more operations on dataFrames which will be Combining DataFrames and Aggregation Functions.

Feedback or queries related to this article are most welcome.

Thanks for reading!!