Post

# Operations On Dataframe - Part One

• 7.8k
• 0
• 4

So far,  we have learned many concepts in Pandas,
Now we will learn about various operations that we can perform in regards to dataFrames,

All these categories mentioned above perform various operations that could be helpful in different kinds of data analysis. So, let’s study them in depth now,

## Binary Operations

Binary means ‘two’, if we perform any operation between two elements then it is a Binary Operation. This includes addition, subtraction, multiplication and division. Since we are considering dataFrames here, so their operations are in regard to two dataFrames, like adding, subtracting or multiplying elements of two dataFrames.

• If 2 dataFrames are all numeric and we want to add those 2 dataFrames, then we use ‘+’.

SYNTAX
dataFrame1-dataFrame2
• For the addition of 2 dataFrames we can also use the method ‘add()’.

SYNTAX
• Also, you can use ‘radd()’, this works the same as add(), the difference is that if we want A+B, we use add(), else if we want B+A, we use radd(). (It won’t make any difference in addition but it would make sense when we need subtraction and division.)

SYNTAX

1. import pandas as pd
2.
3. dict1= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }
4.
5. df1=pd.DataFrame(dict1,index=['0','1','2'])
6. print("This is df1:")
7. print(df1)
8. print('\n')
9.
10. dict2= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], D': [95,87,92] }
11.
12. df2=pd.DataFrame(dict2,index=['0','1','2'])
13. print("This is df2:")
14. print(df2)
15. print('\n')
16.
17. df3=df1+df2
18. print("Using '+', This is df1+df2 :")
19. print("This is df3:")
20. print(df3)
21. print('\n')
22.
24. print("Using 'add()', This is df2+df3 :")
25. print("This is df4:")
26. print(df4)
27. print('\n')
28.
30. print("Using 'radd()', This is df4+df3:")
31. print(df5)
OUTPUT

- , sub(),rsub()
• If you want to perform subtraction within 2 dataFrames then you can use ‘-‘ or method ‘sub()’.

SYNTAX
dataFrame1-dataFrame2

SYNTAX
dataFrame1.sub(dataFrame2)
• As mentioned above, if you want A-B, then use ‘sub()’, but if you want B-A, then use ‘rsub()’

SYNTAX
dataFrame1.rsub(dataFrame2)

• For B-A, you can also use,

SYNTAX
dataFrame2-dataFrame1
1. import pandas as pd
2.
3. dict1= {'A':[85,73,98], 'B':[60,80,58],'C':[90,60,74], 'D': [95,87,92] }
4.
5. df1=pd.DataFrame(dict1,index=['0','1','2'])
6. print("This is df1:")
7. print(df1)
8. print('\n')
9.
10. dict2= {'A':[8,7,9], 'B':[6,8,5], 'C':[9,6,7], 'D': [5,8,2] }
11.
12. df2=pd.DataFrame(dict2,index=['0','1','2'])
13.
14. print("This is df2:")
15. print(df2)
16. print('\n')
17.
18. df3=df1.sub(df2)
19. print("Using 'sub()', This is df1-df2 :")
20. print(df3)
21. print('\n')
22.
23. df4=df1.rsub(df2)
24. print("Using 'rsub()', This is df2-df1 :")
25. print(df4)
OUTPUT

* , mul(), rmul()

If you want to multiply 2 dataFrames then you can use ‘*‘ or method ‘mul()’.

SYNTAX
dataFrame1*dataFrame2

SYNTAX
dataFrame1.mul(dataFrame2)

SYNTAX
dataFrame1.rmul(dataFrame2)
1. import pandas as pd
2.
3. dict1= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], D': [95,87,92] }
4.
5. df1=pd.DataFrame(dict1,index=['0','1','2'])
6. print("This is df1:")
7. print(df1)
8. print('\n')
9.
10. dict2= {'A':[8,7,9], 'B':[6,8,5], 'C':[9,6,7], 'D': [5,8,2] }
11.
12. df2=pd.DataFrame(dict2,index=['0','1','2'])
13. print("This is df2:")
14. print(df2)
15. print('\n')
16.
17. df3=df1*df2
18. print("Using '*', This is df1*df2 :")
19. print("This is df3:")
20. print(df3)
21. print('\n')
22.
23. df4=df2.mul(df3)
24. print("Using 'mul()', This is df2*df3 :")
25. print(df4)
OUTPUT

/ , div(), rdiv()
• If you want to perform division within 2 dataFrames then you can use ‘/‘ or method ‘div()’.

SYNTAX
dataFrame1/dataFrame2

SYNTAX
dataFrame1.div(dataFrame2)
• As mentioned above, if you want A/B, then use ‘div()’, but if you want B-A, then use ‘rdiv()’

SYNTAX
dataFrame1.rdiv(dataFrame2)
• For B/A, you can also use,

SYNTAX
dataFrame2/dataFrame1
1. import pandas as pd
2.
3. dict1= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }
4.
5. df1=pd.DataFrame(dict1,index=['0','1','2'])
6. print("This is df1:")
7. print(df1)
8. print('\n')
9.
10. dict2= {'A':[8,7,9], 'B':[6,8,5], 'C':[9,6,7], 'D': [5,8,2] }
11.
12. df2=pd.DataFrame(dict2,index=['0','1','2'])
13. print("This is df2:")
14. print(df2)
15. print('\n')
16.
17. df3=df1/df2
18. print("Using '/', This is df1/df2 :")
19. print("This is df3:")
20. print(df3)
21. print('\n')
22.
23. df4=df3.div(df2)
24. print("Using 'div()', This is df3/df2 :")
25. print(df4)
26. print('\n')
27.
28. print("Using 'rdiv()', This is df2/df1 :")
29. df5=df1.rdiv(df2)
30. print(df5)
OUTPUT

## Inspection Functions

As the name suggests, these functions are used to inspect or you can say examine a dataframe. To gather information or to know the detailed description of a dataframe these inspection functions are used.

These are to 2 kinds,
1. info()
2. describe()
Let us understand them briefly,

info()

If you want to gather any information about a particular dataFrame like how many rows are there, how many columns, what is its data type, how much memory it uses, etc., then we use method ‘info()’

info() method gives you an output in 7 parts,
1. Type – Gives data type of the object, which is of given dataFrame
2. No. of rows- Prints no. of rows and row names
3. No. of columns- Prints no. of columns and column names
4. Description of all the columns
5. Data Type- Displays data type of each column if it differs
6. Memory Usage
7. Null Count
1. import pandas as pd
2.
3. dict= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }
4.
5. df=pd.DataFrame(dict,index=['0','1','2'])
6. print(df)
7. print("\n")
8. print(df.info()
OUTPUT

describe()

If you want the description of a particular dataFrame, as in statistical information like mean, stand deviation, count of non-NA values, etc. then use method ‘describe()’.
describe() method gives you an output in 8 parts,
1. Count of non-NA values in each column
2. Mean of each column
3. Standard Deviation of each column
4. Minimum values in each column
5. 25% of each column
6. 25% of each column
7. 25% of each column
8. Maximum values in each column
1. import pandas as pd
2.
3. dict= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }
4.
5. df=pd.DataFrame(dict,index=['0','1','2'])
6. print(df)
7. print("\n")
8. print(df.describe())
OUTPUT

3. Retrieve Head and Tail Rows
• If you want to display top 5 rows of a dataFrame, then use- ‘head()’.
• If you want to display bottom 5 rows of a dataFrame, then use- ‘tail()’.
• If you want to display top 7 rows of a dataFrame, then use- ‘head(7)’
• Default value of head() and tail() methods is: 5.

SYNTAX
dataFrame.tail()
1. import pandas as pd
2.
3. dict= {'A':[85,73,98,59,27,78,99,36,58,24,25,32],
4. 'B':[60,80,58,78,52,54,89,63,54,87,52,65],
5. 'C':[90,60,74,69,98,74,23,65,45,78,98,98],
6. 'D':[55,27,92,56,78,88,78,89,23,45,54,34],
7. 'E':[91,12,98,63,98,97,45,96,91,32,65,76]
8. }
9.
10. df=pd.DataFrame(dict,index=['0','1','2','3','4','5','6','7','8','9','10','11'])
11. print(df)
12. print("\n")
13.
15. print("\n")
16. print("Using tail():","\n",df.tail())
17. print("\n")
OUTPUT

4. Iteration

It could be a scenario sometimes that you want to see each item of rows or columns separately. In these kinds of scenarios, we use iteration.
• If you want to separate all the rows or you want to see items a every row separately then use ‘iterrows()’.
• iterrows() would iterate the dataFrame row-wise.
• Here each horizonalsubset is in the form- (row_index, columnNames_and_values)
1. import pandas as pd
2.
3. dict= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }
4.
5. df=pd.DataFrame(dict,index=['0','1','2'])
6. print(df)
7. print("\n")
8.
9. for row,row_series in df.iterrows():
10.     print("Row Index:",row)
11.     print("Columns Names and Values:","\n",row_series,"\n")
OUTPUT

• If you want to separate all the columns or you want to see items of every column separately then use ‘iteritems()’.
• iteritems() would iterate the dataFrame column-wise.
• Here each verticalsubset is in the form- (column_index, rowNames_and_values)
1. import pandas as pd
2.
3. dict= {'A':[85,73,98], 'B':[60,80,58], 'C':[90,60,74], 'D': [95,87,92] }
4.
5. df=pd.DataFrame(dict,index=['0','1','2'])
6. print(df)
7. print("\n")
8.
9. for col,col_series in df.iteritems():
10.     print("Column Index:",col)
11. print("\n")
12. i=0
13. for val in col_series:
14.     print("At Row",i,":",val)
15.     i=i+1
OUTPUT

## SUMMARY

In this article, we covered a few operations- Binary Operations, Inspection Functions, Retrieve Head and Tail Rows and Iteration; Hence Operations on DataFrame-Part1
My next article will be Part 2 of the same topic and we will continue with more operations on dataFrames which will be Combining DataFrames and Aggregation Functions.