It's Linear Regression ⏩


Linear Regression is a widely used statistical or ML method. The earliest form of regression was the method of least squares. The term regression was coined by Sir Frances Galton in 1875. It was based on biological phenomenons for relating heights of descendants to their tall ancestors. 
Linear Regression is a type of Classification problem which best suits to Supervised learning. 
• Analyze the marketing effectiveness, pricing, and promotions on the sales of a product.
• Forecast sales by analyzing the monthly company’s sales for the past few years.
• Predict house prices with an increase in the sizes of houses.
• Calculate causal relationships between parameters in biological systems.
NOTE: With more features, we do not have a line, instead, we have a plane. In higher dimensions where we have more than one input (X), the line is called a plane or a hyper-plane, The equation can be generalized from simple linear regression to multiple linear regression as follows: Y(X)=p0 +p1 *X1 +p2 *X2 +...+pn *Xn
.Residuals: It gives the difference between observed values and fitted values provided by a model. To get the best-fitted line in linear regression, we attempt to minimize the vertical distance between all data points and their distance to the fitted line. 
.Bias: It is an error from wrong assumptions in the learning algorithms. High bias can cause an algo. to miss relevant relations between features and target output. 
.Variance: It is an error from sensitivity to small fluctuations in the training set. High variance can cause an algo. to make random noise in training data. 
.Sweet spot: For any model, it is a level of complexity at which an increase in bias is equivalent to the reduction in variance if complexity exceeds sweet spot, we are in effect of over-fitting the model & if complexity falls short of the sweet spot we are under-fitting the model.
1)Import the necessary libraries, i.e ;
  1. import pandas as pd   
  2. import numpy as np  
  3. import matplotlib.pyplot as plt  
  4. import seaborn as sns 
for importing datasets :
  1. from sklearn import datasets     
2) Separate dataset into two arrays, i.e  X & y, named as selected features and target values respectively.
3) Now split the data into training sets and testing sets, i.e, X_train, X_test, y_train, y_test.
Sklearn already has method train_test_split, to do this task, so it is imported as a part of the model selection as :
  1. from sklearn.model_selection import train_test_split  . 
4) Now we need a linear regression model to train on our dataset, for that we import linear_model as :
  1. from sklearn import linear_model    
# Linear regression is a part of the linear_model family.
5) An instance of the Linear regression model is created and stored in a variable say lm.  
  1. lm=linear_model.LinearRegression() 
6) Now fit the model for train_set as :
  1. X_train , y_train )  
7) Now to make predictions we use test_set & store in a variable for ease of future evaluation . 
  1. pred=lm.predict( X_test)