# Statistical Inference For Machine Learning

AI is a multidisciplinary field that requires a range of skills in statistics, mathematics, predictive modeling and business analysis. An AI professional should feel at ease to build the algorithms necessary, work with various data sources and have an innate ability to ask the right questions and find the right answer.

This article helps lay out the canvas on which the rest of the modules are built.

Statistical Inference is the branch of Statistics which is concerned with using probability concepts to deal with uncertainty in decision-making . The process involves selecting and using a sample statistic to draw inferences about a population parameter based on a subset of it -- the sample drawn from population .

Statistical inference dealss with two classes of situations,
• Hypothesis Testing
• Estimation
Hypothesis testing means to test some hypothesis about a parent population from which the sample is drawn. And estimation represents using statistics obtained from sample as an estimate of unknown parameters of the population from which the sample is drawn.

Hypothesis testing begins with an assumption called a hypothesis. According to Prof. M. Hamburg, a hypothesis in statistics is simply a quantitative statement about a population, for example a coin may be tossed 200 times and we may get heads 80 times and tails 120 times. In testing the hypothesis the coin is unbiased.

## Procedure of Testing Hypothesis

Set up a Hypothesis

The first point in hypothesis testing is to set up a hypothesis about a population parameter, then we collect sample data, produce sample statistics and based on this information decide how likely it is that our hypothesized population parameter is correct. To test the validity, we gather sample data and find the difference between the hypothesized value and the actual value of the sample mean. Then we can judge whether  the difference is significant. The smaller the difference, the greatest likelihood that our hypothesized value for the mean is correct. The larger the difference, the smaller the likelihood.

The hypotheses are represented in a two ways,
• Null Hypothesis
• Alternative Hypothesis
For testing the significance of the difference, the null hypothesis is very useful as a tool. For example if we want to find out whether a particular medicine is effective in curing fever, we will take null hypothesis that the medicine is not effective in curing fever. The rejection of null hypothesis represents that the differences have statistical significance and acceptance of null hypothesis represents that the differences are due to chances.

The alternative hypothesis against the null hypothesis, may represents the whole range of values rather than a single point.

Mathematically, the null hypothesis represents H0 andalternative hypothesis represents Hα. or H1

For example : A test whether or not a certain class of people have a mean IQ higher than 100, may define the following null and alternative hypothesis.

H0 : µ = 100 represents Null Hypothesis
Hα : µ ≠ 100 represents Alternative Hypothesis

Or if testing the difference between the mean IQ of two groups, this may establish the null hypothesis that the two group have equal means (µ1- µ2 =0) and the alternative hypothesis that their means are not equal (µ1- µ2 ≠0), i.e

H0 : µ1- µ2 =0 represents Null Hypothesis
Hα : µ1- µ2 ≠0 represents Alternative Hypothesis

Set up a suitable Significance level

The other step to test the validity ofH0 against Hα at certain level of significance. The confidence with which an researcher rejects or accepts a null hypothesis  totally depends on the significance level adopted.

The significance level is generally represented as a percentage (%) , for example , 5%, is the probability of rejecting the null hypothesis if it is true. When the hypothesis is accepted at 5% level, the statistician is running the risk that inthe  long run, he will be making the wrong decision about 5% of time. By rejecting the hypothesis at the same level, the risk of rejecting a true hypothesis in 5 out of every 100 occurrences or occasions,

If testing at a 1% level then you reduce the chances of making a false judgement but some element of risk remains that 1 out of 100 occurrences that will make the wrong decisions.

The following figure illustrates how to interpret a 5% level of significance , also note that 2.5% of the area under the curve are located in each tail.

## Two Types of Errors in Testing of Hypothesis

Basically when a statistical hypothesis is tested, then there are some possibilities,
• If the hypothesis is true, but our test rejects then error is called Type I Error.
• If the hypothesis is false, but our test accepts, then error is called Type II Error.
• If the hypothesis is true, and our test accepts it then Correct Decision.
• If hypothesis is false, and our test rejects , then again Correct Decision.
In statistical hypothesis testing with any experiment, a Type I error is committed by rejecting the null hypothesis when it is true, then the probability of committing a type 1 error is represented as α, where

α= prob. (Type I Error)
= prob. (Rejecting H0 | Hα is true)

On the other hand , a Type II error is committed by not rejecting the null hypothesis when it is false. The probability of committing a Type II error is denoted as β.

β= probability (Type IInd Error)=
= probability (not rejecting or accepting H0 | Hα is false)

 Accept H0 Reject Hα H0 is true Correct Decision Type Ist Error H0 is false Type II Error Correct Decision

The purpose of testing a hypothesis is to reduce both types of error (Type I and Type II), but due to a fixed sample size, it is practically not possible to control both errors simultaneously. The probability of making one type of error can only be reduced if we are willing to increase the probability of making the other type of error. So in order to get a low β, we will have to put up with a high α. This is more dangerous to accept a false hypothesis, Type II error than to reject a correct one; i,e Type I error, we keep the probability of committing Type I error at a certain level, called a level of significance, and it is denoted by α. In statistical tests, the level of significance fixed at 5% means that the probability of accepting a true hypothesis is 95%.