Statistics For Artificial Intelligence And Data Science


Statistics is an essential prerequisite to Machine Learning and Data Science. Without the proper grasp of the foundation of Statistics, solving real-world problems using Machine Learning would be impossible. For every Artificial Intelligence, Machine Learning, and Data Science enthusiast, Statistics is fundamental to learn in order to dive deeper into these fields. With a proper understanding of Statistics, it will become easy while implementing regression, classification, and numerous other algorithms in Machine Learning. In order to become able to analyze data as a Data Scientist, having a deep understanding of Statistics is a no-brainer. One should master Statistics in order to master Machine Learning and Data Science.
Statistics is a branch of mathematics that deals with data collection, interpretation, analysis, and presenting it to give an insight into what the data actually represents. Statistics can be applied to a wide range of fields from finance, healthcare, technology, demographics, business and so much more. With proper data, statistics can give a perspective about the details and views unseen and unrealizable to the average human.
Central tendencies
As the name suggests, Central Tendencies gives a single value which depicts the central positioning among the given set of data. It can also be used to figure out the center of distribution of data. Central Tendency can be measured in different contexts in multiple ways. Some of the key measures to find out Central Tendency is through calculation of Mean, Median, and Mode.


Mean value is often also termed as Average. It is a very well know measure of calculating Central Tendency. Mean can be used for both discrete and continuous data.
Discrete Data: Discrete data are individual values that can be counted as separate values. It can be graphically presented by a bar graph.
E.g., the Number of readers viewing this article
Continuous Data
Continuous data are values that can be represented with a range.
E.g., Height of readers viewing this article.
Note that, the height of readers viewing this article could range from 3ft to 6 ft and even more or less. Thus, it would be a range of data that would be continuous. A continuous dataset would contain a set of values that are measurable ie. represents the scale of measurements like temperature, length, height, width which doesn’t necessarily even have to be just positive integer values but could be decimals and fractions.
Arithmetic Mean
It is calculated by adding all the values of a dataset divided by the total number of values in the set.
E.g., For a dataset of values, {100,200,300,400,500}
Here, the total number of datasets is: 5
Arithmetic Mean (X) = ∑ (all values in dataset)/ Total no of values in dataset
= (100+200+300+400+500)/5
= 300
The median value separates a given dataset into two parts. One is with  the higher half part and the other is with the lower half part. Median is one value from the set of data which can be termed as a mid-value.
E.g., For a dataset of values - {1,2,3,6,7,8,9}
Out of the 7 values, the number that can divide the data set into equal halves of 3 highs and 3 low values data is the 4th value. Ie. 6
For an unordered data set, the data must first be ordered in ascending order, and thus, the mid-value/median be calculated.
For odd total values (n) of dataset, median is calculated by:
Median(x) = x(n+1)/2
For even total values (n) of dataset, median is calculated by:
Median(x) = (x(n/2)+x(n/2)+1 )/2
The mode value is the value that appears the most frequently in a dataset. The mode can be the same as the Mean or Median value but it doesn’t necessarily have to be so.
E.g., For a given dataset of {5,6,7,7,8,9}
Mode = 7, for 7 is the most frequently appearing value in the dataset
In Order to get into more details, you can watch this complimentary video by AI 42.
Dispersion is the degree to which the data in any given dataset is scattered or stretched or squeezed. Dispersions are measured in a variety of ways. Some of them are included below,
Variance can be defined as the measurement of the squared value of the deviation of all random data from the mean of the dataset. It is the square of Standard Deviation and denoted my Var(X) or squared noted of SD. i.e.. σ2
Standard deviation
As the name suggests, Standard deviation measures the extent to how many values in a dataset deviates from the mean value. It encapsulates the variation or dispersion of data within a dataset. SD represents the abbreviated form of Standard Deviation and is denoted by the Greek letter Sigma ie.σ


Probability Distribution can be defined as a function that explains every possible value and possibility that a variable can output within a given range for any particular experiment.
Continuous Distribution
Continuous Distribution explains the probabilities of occurrence of all the values within a given range in a particular experiment. Only, the range of values has a non – zero probability. In continuous Distribution, the probability of a continuous random variable equaling some value is always 0. It is often represented with the region under the curve.
Discrete Distribution
Discrete Distribution explains the probability of occurrence of every value of a discrete arbitrary variable. In a discrete probability distribution, every possible value of the discrete random variable has a non-zero probability. Henceforth, a discrete probability distribution is mostly represented in a tabular form.
Normal Distribution
It is well known as Gaussian Distribution. This probability distribution is symmetric about mean clearly showing how the data points among the dataset are more frequent in occurrence near the mean value. In a graph, it appears as a Bell Curve.
Covariance gives insight into the relationship between how two random variables are related to each other. A positive covariance expresses that the two random variables are positively related and move in a similar direction.
Correlation measures the linearity between two different random variables. Correlation doesn’t necessarily mean cause and effect. It simply describes the relationship without explaining the cause and effect. Its unit of measurement is the correlation coefficient denoted by r which ranges between –1 to +1.
  • +Ve r value means positive correlation such that the values of both variables, increase together
  • -Ve r value means negative correlation such that the values of the variables tend to increase in opposite direction i.e.. One increases as the other variable value decreases.
When r is closer to 0, the linear relationship between the variables is weakest. 


The foundational knowledge of Statistics, namely, Central Tendencies (Mean, Median and Mode), Dispersions (Variance, Standard Deviation), Probability Distribution (Continuous, Discrete), Covariance, and Correlation are key to get deeper insight from data. Being able to transform these raw observations; i.e., Data, into knowledge helps Machine Learning Engineers, Data Scientists, and Analysts become better at what they do. Learning these basic Statistics helps Data Scientists and Analysts infer from even small samples of data to extrapolate meaningful conclusions and at the same time supports Machine Learning Engineers to clean data, prepare data, set up transformation pipelines, select their models and fine tune them.