## Introduction

In our day to day lives, we may often come across data collection and processing. However, the data we get is not always in the best shape.

Today, we'll be talking about the form of the data. It may be distorted or skewed, but if our luck is good, we may get sorted or normalized data as well. Jokes aside, the data we receive has different structures. To use this data for work, we need to convert it into our desired form.

Before talking about that, let's consider the type of data we can receive. Its type differs from the source it is extracted from. It can be converted into our required format by different means through the process of data cleaning. We will not get into data cleaning right now. Once the data is ready in the required format, we are ready to dig deeper into the data to get the insights we require.

We start this process by measuring the central tendency.

## Central tendency

A measure of central tendency is a summary statistic that has been drawn from the dataset which represents the center point of the dataset. It is the single value that can be used to describe the dataset by locating or identifying the central position in the data.

These measures indicate where most of the values in the distribution of the data fall and they are also called as the central location of the distribution. It is the tendency to cluster around middle values.

There are three major methods of central tendency i.e. mean, median and mode.

## Mean

Mean or average is one of the well-known methods of calculation of central tendency. It can be used by both continuous and discrete datasets. We have discussed both in the dataset in the previous section of the article. Mean is equal to the sum of data values divided by the size or the number of values in the dataset.

Mean can be of a different type as well:

*Arithmetic mean*

The arithmetic mean is the average of numbers: a calculated "central" value of a set of numbers.

*Geometric mean*

The geometric mean is a special type of average where we multiply the numbers together and then take the square root (for two numbers), cube root (for three numbers), etc.

It gives us a way of finding a value in between widely different values.

It is useful when we want to compare things or values.

*Harmonic mean*

It is one of several kinds of averages, and in particular, one of the Pythagorean means.

It is appropriate for the situation when the average of rates is desired. Harmonic mean can be expressed as the reciprocal of the arithmetic mean of the reciprocal of the given set of the observation.

## Median

Median is the middle score for a set of data that has been arranged in order of magnitude. It is less affected by the outliers and skewed data.

It works fine for odd numbers of data. For even numbers of data, we add the middle two values and take their average.

## Mode

It is the most frequent score in our dataset.

On a histogram, it represents the highest bar in a bar chart or histogram. It is sometimes considered as being the most popular option.

It is used for categorical data when we want to know the most common category.

It is problematic when we have continuous data since we are not likely to have any value more frequently than another.

Now the question arises of when to use which method. So, I am providing a summary of the variables and the best-practice centralizing method to work with.

## Type of Variable - Best Measure of Central Tendency

- Nominal - Mode
- Ordinal - Median
- Interval/Ratio(not skewed) - Mean
- Interval/Ratio(skewed) - Median