Data Processing Using Python & Pandas

Introduction

 
Python has become one of the most popular dynamic programming languages, along with Ruby, Perl, etc. Python is very good for data analysis, scientific calculations, and data visualization. It is an excellent language for building data-centric applications. Python has very good libraries like NumPy, Pandas, Matplotlib, etc.
 
Pandas provide very good data structures and function designs. It is very fast and easy. I will explain how to create DataFrame and handling Data with DataFrame by using the Pandas library. This DataFrame with Pandas is very good for data analysis.
 
Installation Required
  1. Python 3.5 must be installed.
  2. Pip3 must be installed:
     
    Go to command prompt → Type Command → pip3 install --upgrade pip
     
  3. Install pandas
     
    Go to command prompt → Type Command → pip install pandas
     
  4. Install Jupyter notebook(It will help you to write and execute python and pandas codes by connecting to the terminal):
     
    Go to command prompt → Type Command → pip3 install jupyter
     
  5. Open Jupyter notebook:
     
    Go to command prompt → Type Command → jupyter notebook
     
  6. It will open Jupyter notebook into a browser like below :
     
    Python
Here, open one new Python project. You need to import Pandas here. So, import the below library:
 
import pandas as pd
 

Prepare Data

 
Series
 
Series is a special method of the Pandas library. It is like an array, list, or column in a table and creates one-dimensional objects. Below is an example of the code:
  1. purchase_1 = pd.Series({  
  2.     'Name''Chris',  
  3.     'Item Purchased''Pencil',  
  4.     'Cost': 22.50  
  5. })  
  6. purchase_2 = pd.Series({  
  7.     'Name''Ram',  
  8.     'Item Purchased''Book',  
  9.     'Cost': 220.50  
  10. })  
  11. purchase_3 = pd.Series({  
  12.     'Name''Mohan',  
  13.     'Item Purchased''Pen',  
  14.     'Cost': 22.50  
  15. })  
  16. purchase_4 = pd.Series({  
  17.     'Name''Gulam',  
  18.     'Item Purchased''Diary',  
  19.     'Cost': 22.50  
  20. })  
  21. df = pd.DataFrame([purchase_1, purchase_2, purchase_3, purchase_4], index = ['Store 1''Store 2''Store 3''Store 4'])  
  22. df.head()  
Here, pd.Series will create a tabular structure of data and pd.DataFrame will merge all series and create two-dimensional, size-mutable, potentially heterogeneous tabular data structures with labeled axes.
 
Press Ctrl + enter key into Jupyter note and see the below output:
 
Python
 
Fetch value for Store 1
  1. df.loc['Store 1']  
Press ctrl + enter key into Jupyter note and see the below output:
 
Python
  1. Get all data  
  2. for "Item Purchased":  
  3. df['Item Purchased']  
Output
 
Python
  1. Get the cost of Store 1:  
  2.     df.loc['Store 1''Cost']  
Output
 
22.5
 
Show column into Row:
  1. df.T   
Output
 
Python
  1. Get cost data  
  2. for all stores:  
  3.     df.T.loc['Cost']  
Python
 
Drop Store 1
  1. df.drop('Store 1')  
Python
 
Multiply Cost with value 10
  1. df['Cost'] *= 0.8   
  2. df   
Output
 
Python
 

Conclusion

 
Pandas library in Python is very good for data analysis and formation. Also, Jupyter is a very good editor for the writing, execution and displaying of results.
 
Please find the attached Python code for more details.


Similar Articles