Understanding Pandas With Examples

Whenever you have started to think about data manipulation and data analysis you might have come across Pandas and if not in this article we will start learning from the very basics, i.e, even from installation. From the start, you might have understood the purpose of using Pandas. Well, Pandas stands for Python Data Analysis Library, mainly used for data manipulation and data analysis, built over Python programming language. It is open-source, fast as well as powerful.

Pandas Installation

Pandas can be installed by using pip from PyPI, the command as follows,

pip install pandas

Features of Pandas

  • Provides support to reshaping and pivoting of date-sets.
  • Provides support of Data alignment
  • Provides support to load data into in-memory from different file formats
  • Provides support to create Fast and Efficient DataFrame
  • Provides support to Time Series functionality.

Pandas primarily deal with the following three data structures

  • Series - Series is usually a one-dimensional homogeneous array.
  • DataFrame - DataFrame is usually two-dimensional with potentially heterogeneously typed columns.
  • Panel - Panels are generally three-dimensional labeled arrays.

Let us understand how to cerate series using scalar values.

import pandas as pd 
# Program to Create series with scalar values 
# Numeric data input
data =[25, 63, 74, 60, 73, 41, 22]  
# Creating series with default index values using pd
i = pd.Series(data)    
# predefined index values
index =['a', 'b', 'c', 'd', 'e', 'f', 'g'] 
# Creating series with predefined index values using pd
ink = pd.Series(data, index) 

The following is the code for the creation of the Dictionary series.

import pandas as pd 

# Create Dictionary series
dict ={'A':10, 'B':20, 'C':30, 'D':40, 'E':50} 
# Creating series of Dictionary type using pd
ans = pd.Series(dict) 

The following code is to show to create Dataframe of three series.

# Create Dataframe of three series
import pandas as pd

# Define series 1
a = pd.Series([10, 23, 48, 96, 75, 34, 85])

# Define series 2	
b = pd.Series([2.5 3.5, 4.5, 5.5, 6.5, 7.5])

# Define series 3
c = pd.Series(['A', 'B', 'C', 'D', 'E'])	

# Define Data
Data ={'First':a, 'Second':b, 'Third':c}

# Create DataFrame using pd
dfseriescreate = pd.DataFrame(Data)			

Creation of DataFrame from 2D array.

# Program to create DataFrame from 2D array

# Import Library
import pandas as pd

# Define 2d array 1
a =[[29, 50, 74], [96, 76, 45]]

# Define 2d array 2
b =[[27, 46, 12], [10, 63, 42]]

# Define Data
c ={'first': a, 'second': b}

# Create DataFrame
dataframecreate = pd.DataFrame(c)	

Reading and Writing to different file formats using Pandas DataFrame


#Writng to a CSV File
#Reading a CSV File
df = pd.read_csv('data.csv')


#Writing to an Excel File
#Reading an Excel file
df = pd.read_excel('data.xlsx')


#Reading a json file
df = pd.read_json('data.json')
#Writing to Json File

Although there are many file formats, the above were some examples but not all.

The above examples were to make a beginner start with Pandas, it is to be noted that with better practice and better research about the sub-topics one can have a better grip and command over the subject.