Data Processing Using Python & Pandas

Gul Md Ershad
6y
25k
0
1

Article

Introduction

Python has become one of the most popular dynamic programming languages, along with Ruby, Perl, etc. Python is very good for data analysis, scientific calculations, and data visualization. It is an excellent language for building data-centric applications. Python has very good libraries like NumPy, Pandas, Matplotlib, etc.

Pandas provide very good data structures and function designs. It is very fast and easy. I will explain how to create DataFrame and handling Data with DataFrame by using the Pandas library. This DataFrame with Pandas is very good for data analysis.

Installation Required

Python 3.5 must be installed.
Pip3 must be installed:

Go to command prompt → Type Command → pip3 install --upgrade pip
Install pandas

Go to command prompt → Type Command → pip install pandas
Install Jupyter notebook(It will help you to write and execute python and pandas codes by connecting to the terminal):

Go to command prompt → Type Command → pip3 install jupyter
Open Jupyter notebook:

Go to command prompt → Type Command → jupyter notebook
It will open Jupyter notebook into a browser like below :

Here, open one new Python project. You need to import Pandas here. So, import the below library:

import pandas as pd

Prepare Data

Series

Series is a special method of the Pandas library. It is like an array, list, or column in a table and creates one-dimensional objects. Below is an example of the code:

purchase_1 = pd.Series({
'Name': 'Chris',
'Item Purchased': 'Pencil',
'Cost': 22.50
})
purchase_2 = pd.Series({
'Name': 'Ram',
'Item Purchased': 'Book',
'Cost': 220.50
})
purchase_3 = pd.Series({
'Name': 'Mohan',
'Item Purchased': 'Pen',
'Cost': 22.50
})
purchase_4 = pd.Series({
'Name': 'Gulam',
'Item Purchased': 'Diary',
'Cost': 22.50
})
df = pd.DataFrame([purchase_1, purchase_2, purchase_3, purchase_4], index = ['Store 1', 'Store 2', 'Store 3', 'Store 4'])
df.head()

Here, pd.Series will create a tabular structure of data and pd.DataFrame will merge all series and create two-dimensional, size-mutable, potentially heterogeneous tabular data structures with labeled axes.

Press Ctrl + enter key into Jupyter note and see the below output:

Fetch value for Store 1

df.loc['Store 1']

Press ctrl + enter key into Jupyter note and see the below output:

Get all data
for "Item Purchased":
df['Item Purchased']

Output

Get the cost of Store 1:
df.loc['Store 1', 'Cost']

Output

22.5

Show column into Row:

df.T

Output

Get cost data
for all stores:
df.T.loc['Cost']

Drop Store 1

df.drop('Store 1')

Multiply Cost with value 10

df['Cost'] *= 0.8
df

Output

Conclusion

Pandas library in Python is very good for data analysis and formation. Also, Jupyter is a very good editor for the writing, execution and displaying of results.

Please find the attached Python code for more details.