How To Measure Central Tendency Using Pandas In Python - Data Science

Data Science enables practitioners to do various mathematical operations on data, to get the best insight of the data and with desired output objective. Not just to mention but with python, it becomes more exciting to do operations on data.

Generally, in Mathematical terms central tendency means the center of the distribution, it enables to get the idea of the average value with the indication of how widely the values are spread. There are three main measures of central tendency, which can be calculated using Pandas in the Python library, namely,

  • Mean
  • Median
  • Mode

Mean can be defined as the average of the data observation, calculated by adding up all the number in the data and dividing it by the total number of data terms. Mean is preferred when the data is normally distributed.

Mean= x̄ = ∑x/ N

Median can be defined as middle number data in a given set of observations, calculated by arranging the data in the required order and the middle data is taken out. Median is best used when data is skewed.

Median = (n + 1/2)th observation  if the total observation is odd. 

Mode can be defined as the highest frequency occurring number in a given set of datasets, if there is a unique dataset then there is no mode at all.

MEAN

Creating the dataset

import pandas as pd

# Creating the dataframe of student's marks
df = pd.DataFrame({"John - Marks ":[98,87,76,88,96],
				"Adam - Marks":[88,52,69,79,80],
				"David - Marks":[90,92,71,60,64],
				"Rahul - Marks":[88,85,79,81,91]})

# Printing the dataframe
df

The data frame has been created using pd.DataFrame and is stored in df variable. The values are then displayed as output.

Output

Calculating the Mean using the above dataset,

df.mean(axis = 0)

Output

MEDIAN

Creating the dataset

import pandas as pd

# Creating the dataframe of student's marks
df = pd.DataFrame({"John - Marks ":[98,87,76,88,96],
				"Adam - Marks":[88,52,69,79,80],
				"David - Marks":[90,92,71,60,64],
				"Rahul - Marks":[88,85,79,81,91]})

# Printing the dataframe
df

The data frame has been created using pd.DataFrame and is stored in df variable. The values are then displayed as output 

Output

Now, we calculate the MEDIAN

df.median(axis = 0)

MODE

We will now create the dataset

import pandas as pd

# Creating the dataframe of student's marks
df1 = pd.DataFrame({"John - Marks ":[98,87,87,76,88],
				"Adam - Marks":[88,52,69,79,79],
				"David - Marks":[90,92,71,71,64],
				"Rahul - Marks":[88,85,85,81,91]})

# Printing the dataframe
df1

The data frame has been created using pd.DataFrame and is stored in df1 variable. The values are then displayed as output 

Output 

Now, we will find the MODE

df1.mode()

Output