Importance Of Probability In Machine Learning And Data Science

Among many fields and branches of mathematics, Probabilities plays a significantly important in both Artificial Intelligence and Data Science. Today, we’ll cover the basics of what probabilities really are and the theorems and real-world examples where these tools are used and how.
Read the previous article Statistics For Artificial Intelligence And Data Science to understand the foundation of Statistics that is used by Machine Learning Engineers and Data Scientists. 


Probability can be defined as the likeliness of something to occur or happen. Every time we need to explain what is the change of some outcome or an event to occur, we talk in terms of Probability.
What is the chance that the head or tail occurs when we flip a coin? This is probability.

What is the chance that 2 shows up when we roll dice? This can be explained by probability.
The way to calculate the probability of the occurrence of an event is as follows:
Probability of Event = number of ways it can happen / Total number of outcomes
For a coin having two sides, the probability that head shows up would be,
Probability of Head = number of ways it can happen / Total number of outcomes
There are two possible outcomes, head(H) and tail(T), which is one of each way either can happen.
Example using Coin,
Probability of Head i.e.. P(H)= 1 / 2
Probability of Tail i.e.. P(T)= 1 / 2
Example using Dice,
Similarly, for a Dice which has 6 sides, with each side having 1, 2, 3, 4, 5, or 6.
Probability of occurrence of 2 i.e.. P(2) = 1/ 6 
Probability of occurrence of 1. i.e.. P (Rolling 1) = 1/6 = 16.7%
Understanding Likelihood
In statistics, Likelihood is not Probability, alhough it can be used as a synonym in regular speech. But for any statistician, this would be nothing short of Wrong. As probability explains the measure of the change of any specific event or outcome to occur, Likelihood is used to increase the chances of any specific outcome to occur. One needs to choose the given distribution in a better way to increase the chance of the occurrence of the outcome.

Probabilities in real life

There are basically different ways to calculate Probability for the same problem.
Theoretical Probability
Theoretical Probability is calculated on the foundation of reasoning. This is the most accurate depiction of any possible outcome. This is the expected value which is more intuitive.
Experimental Probability
Experimental Probability is calculated by repeating experiments multiple times and observing the results. This is a different approach than Theoretical Probability. Anyone can perform the experiment and calculate the probability.
Law of Large Numbers
In Law of Large Numbers, we discuss how when experimenting multiple times for a particular event, we tend to get closer to the expected value. The average of the outcomes when obtained from a Large number of experiments, will be closer to the expected value with the increase in the number of experiments performed. The larger the number of experiments, the closer or more accurate the probability value obtained.

Conditional Probability

Conditional Probability can be explained as the probability of an event’s occurrence concerning one or multiple other events.
Eg. Let us suppose, Event A – You will read this article today.
Event B – You will drink a beverage today.
The conditional probability would be looking at these two different events, Event A and Event B in relationship with each other, and calculate both Events A and B happening such that you would be drinking a beverage while reading this article today.
For another instance, let us suppose, Event A – It will rain today
Event B – You need to go out today
The conditional probability would be, the probability of both Event A and B happening ie. You would need to go outside while it is raining today. This could predict, what is the probability of you needing to carry an umbrella today.
To understand the impact of Probability in Machine Learning and Data Science, feel free to watch the following video by AI 42.
Independent Event
An independent event is an event that doesn’t have any relationship with the occurrence of any other event. Ie. Its occurrence doesn’t affect the probability of the happening of any other event.
Eg. When you roll a dice 5, it is an independent event. The dice rolls 5 or 6 or any other value, the prior occurrence of 5 has nothing to do which the followed-up rolling of the dice.
Dependent Event
Dependent Events are a set of events that depend upon the occurrence of any of the Other. The probability of occurrence of one event depends upon the occurrence of the other event. Thus, we call it dependent.
Eg. There are 100 M&M’s in a jar which is a mixture of 8 colors. When you take out One M&M of the color red, this would affect the probability of the occurrence of any color of M&M from the jar. The next outcome is dependent upon the prior.


Probability Distribution can be defined as a function that explains every possible value and possibility that a variable can output within a given range for any particular experiment.
Continuous Distribution
Continuous Distribution explains the probabilities of occurrence of all the values within a given range in a particular experiment. Only, the range of values has a non – zero probability. In continuous Distribution, the probability of a continuous random variable equaling some value is always 0. It is often represented with the region under the curve.
Discrete Distribution
Discrete Distribution explains the probability of occurrence of every value of a discrete arbitrary variable. In a discrete probability distribution, every possible value of the discrete random variable has a non-zero probability. Henceforth, a discrete probability distribution is mostly represented in a tabular form.

Bayes’ Theorem

Bayes’ Theorem explains a method to find out conditional probability. This theorem is named after the 18th-century British Mathematician Thomas Bayes, who discovered this theorem. We know, Conditional Probability can be explained as the probability of an event’s occurrence concerning one or multiple other events. This mathematical formula has been widely used in Machine Learning for Modeling Hypotheses, Classification, and Optimization.
For Two Events, A and B Bayes’ Theorem states,
P(A/B) = P(A) P(B/A)/ P(B)
E.g.: Let us have an algorithm were looking at images, we verdict if the patient has cancer. Let, 0.04 be the probability looking at the image, there is cancer and 0.96 probability of not being cancer. Let, 0.8 be True Positive value for the 0.04 and 0.2 as Negative for the same cancer case.
Let, 0.05 be Positive and 0.95 be Negative for the 0.96 probability of No cancer case. Then, using Bayes’ Theorem, we have,
P (Cancer/ Positive) = True positive/ (True Positives + False Negatives)
= 0.04*0.8/(.04*0.8)+(0.96*0.05)
= 0.87
Examples of Bayes Theorem used in practice in machine learning and data science,
  • Decision Trees for best possible option
  • Confusion matrix
  • Building machine learning algorithms and evaluating it

Why do we need this mathematics for ML and Data Science?

Machine Learning is a subset of Artificial Intelligence (AI). Though AI and Data Science are two different fields, there are lots of things that overlap, between the two. We need to understand the mathematics that goes behind the models we use for AI and Data Science. The mathematics, we’ve learned Statistics and Probabilities are widely used in Data Science. Similarly, Linear Algebra, Calculus and Probabilities are hugely used in Artificial Intelligence to figure out Linear Regression. Basic Algebra acts as the backbone for all these different areas of Mathematics which are then used by Artificial Intelligence and Data Science.

How much statistics are necessary to know on an everyday basis?

Probabilities help us realize if this problem can be solved with the tools and resources and data we have. Engineers make decisions with the information they have and thus, statistics and probabilities are important. So, Statistics and Probabilities are widely used from the inception of a project to firstly figure out the ways to solve the problem and to the end of completion of find the solution.


Today, we learned about Probabilities. We learned what Probabilities are, how they are used in real life, and understood different types of Conditional Probabilities. We also figured out the way to use Bayes’ theorem and understood its significance. One of the key things to take note of is the difference between Probability and Likelihood. A lot of engineers mistake likelihood for probability itself and thus, each individual working in the field should learn these foundational tools to help create the solutions most efficiently.

Read the next article to learn about T-SQL for Data Science

Similar Articles