Operations On Dataframe - Part Two

Introduction 

 
So far, we have learned many concepts in Pandas:
In my previous article, which was Part 1 of operations on DataFrame, we learned about a few operations, such as Binary Operations, Inspection Functions, Retrieve Head and Tail Rows, and Iteration.
Operations On Dataframe
 
Now we are left with Combining DataFrames and Aggregation Functions. So, let’s study them in-depth now:
 

Combining DataFrames

 
We learned about various combining methods:
  • combine_first()
  • concat()
  • merge()
combine_first()
 
As the name suggests -combine_first, it combines 2 dataFrames using the Patching method.
 
If in a dataFrame, a certain value is missing and the corresponding cell of another dataFrame has valid data, then Patching comes into the role. It patches the value in the first dataFrame; Hence combine_first.
 
SYNTAX
 
dataFrame1.combine_first(dataFrame2)
  1. import pandas as pd      
  2.       
  3. dict1= {'A':[85,73,None], 'B':[60,None,58], 'C':[90,60,74], 'D': [95,87,92] }      
  4.       
  5. df1=pd.DataFrame(dict1,index=['0','1','2'])      
  6. print("This is df1:")      
  7. print(df1)      
  8. print('\n')      
  9.       
  10. dict2= {'A':[85,73,98], 'B':[60,80,58], 'C':[None,60,74], 'D': [95,87,None] }      
  11.       
  12. df2=pd.DataFrame(dict2,index=['0','1','2'])      
  13. print("This is df2:")      
  14. print(df2)      
  15. print('\n')      
  16.       
  17. print("Using combine_first()")    
  18. print("Combined Dataframe:")    
  19. print(df1.combine_first(df2))   
OUTPUT
 
Operations On Dataframe
 
concat()
 
This method concatenates the dataFrames irrespective of its values, even if the values are missing it concatenates the dataFrames and gives the output as a single dataFrame.
 
SYNTAX
 
concat([dataFrame1,dataFrame2]) 
  1. import pandas as pd      
  2.       
  3. dict1= {'A':[85,73,None], 'B':[60,None,58], 'C':[90,60,74], 'D': [95,87,92] }      
  4.       
  5. df1=pd.DataFrame(dict1,index=['0','1','2'])      
  6. print("This is df1:")      
  7. print(df1)      
  8. print('\n')      
  9.       
  10. dict2= {'A':[85,73,98], 'B':[60,80,58], 'C':[None,60,74], 'D': [95,87,None] }      
  11.       
  12. df2=pd.DataFrame(dict2,index=['0','1','2'])      
  13. print("This is df2:")      
  14. print(df2)      
  15. print('\n')      
  16.       
  17. print("Using concat()")      
  18. print("Combined Dataframe:")    
  19. print(pd.concat([df1,df2]))    
OUTPUT
 
Operations On Dataframe
 
merge()
 
If two different dataFrames have a common column (same concept as the foreign key in Database), then if we merge those two DataFrames we get all columns of dataFrame1 and dataFrame2 in a single dataFrame as an output.
 
If there is no common column in 2 dataFrames, then merge() cannot be used, it gives an error.
 
SYNTAX
 
merge[dataFrame1,dataFrame2)
  1. import pandas as pd    
  2.     
  3. dict1= {'ID':[85,73,99], 'Name':["Alpha","Beta","Gamma"], 'Course':[909,606,741], 'Marks': [95,87,92] }    
  4.     
  5. df1=pd.DataFrame(dict1,index=['0','1','2'])    
  6. print("This is df1:")    
  7. print(df1)    
  8. print('\n')    
  9.     
  10. dict2= {'ID':[85,73,99], 'Course_ID':[6000,8000,5800] }    
  11.     
  12. df2=pd.DataFrame(dict2,index=['0','1','2'])    
  13.     
  14. print("This is df2:")    
  15. print(df2)    
  16. print('\n')    
  17.     
  18. print("Using merge()")        
  19. print("Combined Dataframe:")    
  20. print(pd.merge(df1,df2))   
OUTPUT
 
Operations On Dataframe
 

Aggregation Functions

 
Now we will learn about the various aggregation functions:
  • min()
  • max()
  • count()
  • sum()
  • mad()
  • quantile()
min()
  • As the name suggests, this method fetches out the minimum values within a dataFrame.
  • It contains 3 parameters:
    1. axis- It can take either 0 or 1. ‘0’ stand for column-wise separation and ‘1’ for row-wise. If you don’t mention any axis, default is ‘0’.
    2. skipna- It can be either ‘true’ or ‘false’. If you don’t want to skip NA values then set skipna=’false’. Default value is ‘true’.
    3. numeric_only - It can be either ‘true’ or ‘false’. If you don’t separation with only numeric values then set numeric_only=’false’. Default value is ‘true’
  • If you want column-wise minimum values, use: dataFrame.min(axis=1)ordataFrame.min(axis=’columns’)
  • If you want row-wise minimum values, use: dataFrame.min()ordataFrame.min(axis=0)ordataFrame.min(axis=’index’)
SYNTAX
 
dataFrame.min(axis=0, skipna=’true’,numeric_only=’true’)
  1. import pandas as pd    
  2.     
  3. dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }    
  4.     
  5. df=pd.DataFrame(dict,index=['2018','2019','2020'])    
  6. print(df)    
  7. print('\n')    
  8.     
  9. print("Minimun row wise:")    
  10. print(df.min())    
  11. print('\n')    
  12.     
  13. print("Minimun column wise:")    
  14. print(df.min(axis=1))    
OUTPUT
 
Operations On Dataframe
 
max()
  • As the name suggests, this method fetches out the maximum values within a dataFrame.
  • It also contains 3 parameters(same as min())

    SYNTAX
    dataFrame.max(axis=0, skipna=’true’,numeric_only=’true’)

  • If you want column-wise maximum values, use: dataFrame.max(axis=1)ordataFrame.min(axis=’columns’)
  • If you want row-wise maximum values, use: dataFrame.max(axis=0)ordataFrame.max() ordataFrame.max(axis=’index’)
    1. import pandas as pd    
    2.     
    3. dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }    
    4.     
    5. df=pd.DataFrame(dict,index=['2018','2019','2020'])    
    6.     
    7. print(df)    
    8. print('\n')    
    9.     
    10. print("Maximum row wise:")    
    11. print(df.max())    
    12. print('\n')    
    13.     
    14. print("Maximum column wise:")    
    15. print(df.max(axis=1))    
OUTPUT
 
Operations On Dataframe
 
count()
  • As the name suggests, this method fetches the count of entries in the dataFrame.
  • It also contains 3 parameters (same as min())

    SYNTAX
    dataFrame.count(axis=0, skipna=’true’,numeric_only=’true’)

  • If you want column-wise count, use: dataFrame.count(1)ordataFrame.count(axis=’columns’)
  • If you want row-wise count, use: dataFrame.count(0)ordataFrame.count()ordataFrame.count(axis=’index’)
    1. import pandas as pd    
    2.     
    3. dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }    
    4.     
    5. df=pd.DataFrame(dict,index=['2018','2019','2020'])    
    6. print(df)    
    7. print('\n')    
    8.     
    9. print("Row-wise count:")    
    10. print(df.count())    
    11. print('\n')    
    12.     
    13. print("Column-wise count:")    
    14. print(df.count(1))   
OUTPUT
 
Operations On Dataframe
 
sum()
  • As the name suggests, this method fetches the sum of entries in the dataFrame.
  • It also contains 3 parameters (same as min())

    SYNTAX
    dataFrame.sum(axis=0, skipna=’true’,numeric_only=’true’)

  • If you want column-wise sum, use: dataFrame.sum(1)ordataFrame.sum(axis=’columns’)
  • If you want row-wise sum, use: dataFrame.sum(0)ordataFrame.sum()ordataFrame.sum(axis=’index’)
    1. import pandas as pd    
    2.     
    3. dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }    
    4.     
    5. df=pd.DataFrame(dict,index=['2018','2019','2020'])    
    6. print(df)    
    7. print('\n')    
    8.     
    9. print("Row-wise sum:")    
    10. print(df.sum())    
    11. print('\n')    
    12.     
    13. print("Column-wise sum:")    
    14. print(df.sum(1))   
OUTPUT
 
Operations On Dataframe
 
mad()
  • MAD stand for Mean Absolute Deviation.
  • It also has 3 parameters (axis, skipna, numeric_only), you can refer to min() for details of these parameters.

    SYNTAX
    dataFrame.mad(axis=0, skipna=’true’,numeric_only=’true’)

  • If you want column-wise mean absolute deviation, use: dataFrame.mad(axis=1)ordataFrame.mad(axis=’columns’)
  • If you want row-wise mean absolute deviation, use: dataFrame.mad(axis=0)ordataFrame.mad()ordataFrame.mad(axis=’index’)
    1. import pandas as pd    
    2.     
    3. dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }    
    4.     
    5. df=pd.DataFrame(dict,index=['2018','2019','2020'])    
    6. print(df)    
    7. print('\n')    
    8.     
    9. print("Row-wise:")    
    10. print(df.mad())    
    11. print('\n')    
    12.     
    13. print("Column-wise:")    
    14. print(df.mad(1))    
OUTPUT
 
Operations On Dataframe
 
quantile()
  • Quantiles are the points in a distribution that relate to the rank order of values in that distribution.
  • In short, the range of probability distribution into continuous intervals with equal probabilities.
  • The quantile of the value is the fraction of observations less than or equal to the values.
  • quantile() has 4 parameters:
    • q- This parameter determines the percentage of quantile to be done to a dataFrame.
    • axis(0 or 1)
    • numeric_only(true or false)
    • interpolation – It is the construction of new data points within the range of a discrete set of known data points.
  • q is a parameter with range [0.25,0.75,0.5,1.0], the default value is 0.5(50% of quantile). And remember that 0 < q <= 1
  • If q is an array, then a dataFrame is returned.
  • If q is float, then a series is returned.
  • interpolationparameter can be used with these values: [‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’]

    SYNTAX
    dataFrame.quantile(q=0.5, axis=0, numeric_only=’true’, interpolation=’linear’)

  • If you want column-wise quantile, use: dataFrame.quantile(axis=1) or dataFrame.quantile(axis=’columns’)
  • If you want row-wise mean absolute deviation, use: dataFrame.quantile(axis=0) or dataFrame.quantile() or dataFrame.quantile(axis=’index’)
    1. import pandas as pd      
    2.       
    3. dict= {'English':[85,73,98], 'Math':[60,80,58], 'Science':[90,60,74], 'French': [95,87,92] }      
    4.       
    5. df=pd.DataFrame(dict,index=['2018','2019','2020'])      
    6. print(df)      
    7. print('\n')      
    8.       
    9. print("Row-wise:")      
    10. print(df.quantile())      
    11. print('\n')      
    12.       
    13. print("Column-wise:")      
    14. print(df.quantile(1))      
    15. print('\n')      
    16.       
    17. print("Quantile with 0.25 value(Row-wise):")      
    18. print(df.quantile(q=0.25,axis=0,numeric_only='true'))     
OUTPUT
 
Operations On Dataframe
 

Summary

 
In this article, we discussed various methods of Combining Dataframes, and we also learned about Aggregation Functions. I hope this will help readers understand how to use and implement Pandas. Now you can play with data, create your own dataframes, and start analyses.
 
This is the end of my PANDAS Series.
 
Feedback or queries related to this article are most welcome.
 
Thanks for reading!!


Similar Articles