## Supervised learning

Give the right answer for each example of data. The following approaches can be used in supervised learning.

**Regression:**Predict real value output.**Classification:**Discrete value output.

## Linear regression

Linear Regression is an approach to show the relationship between the independent variable x and dependent variable y.

Our goal is to find the fit of the line. The best fit means where the error is minimum. It can make our prediction more accurate.

Using this line we can predict a value that is not in the data set. The value will be predicted more accurately as we have achieved the best results after the data set.

If there is a graph between the house prices and size in feet two we can predict the price of the house at any value of the size of the house using the best-fit line.

I am using a house price example to explain this.

### Terms used most frequently

m= no. of training example

x=input variable/feature

y=output variable/target

(x,y)= single training example, one row

(x(i),y(i))= ith training example i is not power but row number

(x(2),y(2))= (1406, 232)

Our learning machine should look like the following

**Input**

**Estimated output**

## What is the hypothesis?

hθ(x) = θ0+ θ1 x the equation of the lineθ0,θ1 are parameter and how the effect.

So we have to find theta 0, and theta 1 to get the best line for our training set. hθ(x) is close to y.

This means we have to get an error between input and output.

Mean hθ(x)-y=small, minimum

hθ(x(i))-y(i)for 1 term.

All the values can be written as

We will be using the sq. error function for the regression problem to get the accurate difference.

J (θ) is called the cost function.

Let’s understand the cost function

x |
y |

1 | 1 |

2 | 2 |

3 | 3 |

For Fixed θ1 let suppose θ1=1θ0=0 as to draw it in 2d.

As

J (θ) =1/2m (02+02+02), as input is equal to output with no difference.

Let us take θ1=0.5θ0=0 to draw it in 2d.

J (θ) =1/2m ((0.5-1)2+ (1-2)2+ (1.5-3)2)=3.5/6=0.58

For each changing J (θ) the left graph will be changed.

At the circled point, we get minimum error minimized cost function.

If we use both parameters we will display it using Contour plots which look like a bowl shape and contain circles at any point on the same circle error is the same. Our goal is to move towards the bottom of the bowl and the smallest circle where the error is minimal.

For this purpose, we use an algorithm that is called gradient descent. It minimizes our cost function.

Now it's time to select a learning rate. It should not be selected too much smaller because it will slow our algorithm and should not be taken so much greater that it may skip our convergence point.

Now taking the derivative of our gradient descent algorithm it will become.

The algorithm will be working as the following image.

The above image is taken from Andrew Ng's Machine Learning.