# Machine Learning: Logistic Regression

## Introduction

In the previous chapter, we studied Linear Regression.

In this chapter, we will learn Logistic Regression.

Note: if you can correlate anything with yourself or your life, there are greater chances of understanding the concept. So try to understand everything by relating it to humans.

## What is Logistic Regression?

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. Logistic regression models the probabilities for classification problems with two possible outcomes. It’s an extension of the linear regression model for classification problems.

## What is it used for?

Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval, or ratio-level independent variables.

## Difference between Linear and Logistic Regression

 BASIS FOR COMPARISON LINEAR REGRESSION LOGISTIC REGRESSION Basic The data is modeled using a straight line. The probability of some obtained event is represented as a linear function of a combination of predictor variables. Linear relationship between the dependent and independent variables Is required Not required The independent variable Could be correlated with each other. (Especially in multiple linear regression) Should not be correlated with each other (no multicollinearity exists). Outcome the outcome (dependent variable) is continuous. It can have any one of an infinite number of possible values. the outcome (dependent variable) has only a limited number of possible values. Dependent Variable Linear regression is used when your response variable is continuous. For instance, weight, height, number of hours, etc. Logistic regression is used when the response variable is categorical in nature. For instance, yes/no, true/false, red/green/blue, 1st/2nd/3rd/4th, etc. Equation Linear regression gives an equation that is of the form Y = MX + C, which means equation with degree 1. logistic regression gives an equation which is of the form Y = eX + e-X Coefficient Interpretation the coefficient interpretation of independent variables is quite straightforward (i.e. holding all other variables constant, with a unit increase in this variable, the dependent variable is expected to increase/decrease by xxx). depends on the family (binomial, Poisson, etc.) and link (log, logit, inverse-log, etc.) you use, the interpretation is different. Error Minimization Technique uses ordinary least squares method to minimize the errors and arrive at a best possible fit uses a maximum likelihood method to arrive at the solution.

## Why is Logistic Regression called so?

The meaning of the term regression is very simple: any process that attempts to find relationships between variables is called regression. Logistic regression is a regression because it finds relationships between variables. It is logistic because it uses a logistic function as a link function. Hence the full name.

## What is the goal of Logistic Regression?

The goal of logistic regression is to correctly predict the category of outcome for individual cases using the most parsimonious model. To accomplish this goal, a model is created that includes all predictor variables that are useful in predicting the response variable. In other words, The goal of logistic regression is to find the best fitting (yet biologically reasonable) model to describe the relationship between the dichotomous characteristic of interest (dependent variable = response or outcome variable) and a set of independent (predictor or explanatory) variables.

## Types of Logistic Regression

1. Binary Logistic Regression
The categorical response has only two 2 possible outcomes. Example: Spam or Not

2. Multinomial Logistic Regression
Three or more categories without ordering. Example: Predicting which food is preferred more (Veg, Non-Veg, Vegan)

3. Ordinal Logistic Regression
Three or more categories with ordering. Example: Movie rating from 1 to 5

## Key Terms

### 1. Logit

In statistics, the logit function or the log-odds is the logarithm of the odds p/(1 − p) where p is the probability. It is a type of function that creates a map of probability values from [0,1] to It is the inverse of the sigmoidal "logistic" function or logistic transform used in mathematics, especially in statistics.

In deep learning, the term logits layer is popularly used for the last neuron layer of neural networks used for classification tasks, which produce raw prediction values as real numbers ranging from

### 2. Logistic Function

The logistic function is a sigmoid function, which takes any real input t, (), and outputs a value between zero and one; for the logit, this is interpreted as taking input log-odds and having output probability. The standard logistic function is defined as follows:

### 3. Inverse of Logistic Function

We can now define the logit (log odds) function as the inverse of the standard logistic function. It is easy to see that it satisfies:

and equivalently, after exponentiating both sides we have the odds:

where,
• g is the logit function. The equation for g(p(x)) illustrates that the logit (i.e., log-odds or natural logarithm of the odds) is equivalent to the linear regression expression.
• ln denotes the natural logarithm.
• The formula for p(x) illustrates that the probability of the dependent variable for a given case is equal to the value of the logistic function of the linear regression expression. This is important as it shows that the value of the linear regression expression can vary from negative to positive infinity and yet, after transformation, the resulting probability p(x) ranges between 0 and 1.
• is the intercept from the linear regression equation (the value of the criterion when the predictor is equal to zero).
• is the regression coefficient multiplied by some value of the predictor.
• base 'e' denotes the exponential function.

### 4. Odds

The odds of the dependent variable equaling a case (given some linear combination x of the predictors) is equivalent to the exponential function of the linear regression expression. This illustrates how the logit serves as a link function between the probability and the linear regression expression. Given that the logit ranges between negative and positive infinity, it provides an adequate criterion upon which to conduct linear regression and the logit is easily converted back into the odds.
So we define odds of the dependent variable equaling a case (given some linear combination x of the predictors) as follows:

### 5. Odds Ration

For a continuous independent variable, the odds ratio can be defined as:

This exponential relationship provides an interpretation for : The odds multiply by for every 1-unit increase in x.

For a binary independent variable, the odds ratio is defined as where a, b, c, and d are cells in a 2×2 contingency table.

### 6. Multiple Explanatory Variable

If there are multiple explanatory variables, the above expression can be revised to

Then when this is used in the equation relating the log odds of success to the values of the predictors, the linear regression will be a multiple regression with m explanators; the parameters for all j = 0, 1, 2, ..., m are all estimated.

Again, the more traditional equations are:

and

where usually b = e.

## Logistic Regression

logistic regression produces a logistic curve, which is limited to values between 0 and 1. Logistic regression is similar to linear regression, but the curve is constructed using the natural logarithm of the “odds” of the target variable, rather than the probability. Moreover, the predictors do not have to be normally distributed or have equal variance in each group.

The logistic Regression Equation is given by

Taking natural log on both sides we get

Till now, we have seen the equation for one variable, so now following is the equation when the number of variables is more than one

where usually b = e. OR

Let me use an example, to explain in which cases we will use logistic Regression:

Linear regression will fail in cases where the boundaries are pre-defined, as if we use Linear Regression, it may predict outside the boundaries. For example, let's take the example of housing price prediction, that we used in the last chapter, so when predicting, there are chances that linear regression would predict the price too high, which may not be practically possible, or too loss such as may be negative.

Since in the case of binary classification, there are only two possible outcomes, but it is not necessary that the input data be distributed uniformly, it is often seen that the class '0' data point if found in the decision boundary of class '1'. Since, the sigmoid function is a curve, the possibility of getting a perfect fit increases, hence resolving the problem of having

## Logistic Regression Example

Let's take the example of the IRIS dataset, you can directly import it from the sklearn dataset repository. Feel free to use any dataset, there some very good datasets available on kaggle and with Google Colab.

### 1. Using SKLearn

1. %matplotlib inline
2. import numpy as np
3. import matplotlib.pyplot as plt
4. import seaborn as sns
5. from sklearn import datasets
6. from sklearn.linear_model import LogisticRegression
In the above code, we are importing the required libraries, that we will be using.
1. X = iris.data[:, :2]
2. y = (iris.target != 0) * 1
Manipulating and pre-processing the data, so that it can be fed to the model.
1. plt.figure(figsize=(106))
2. plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
3. plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
4. plt.legend();
Let us try to visualize the imported data

1. model = LogisticRegression(C=1e2)
2. %time model.fit(X, y)
3. print(model.intercept_, model.coef_,model.n_iter_)
In the above code, we are timing the training process using the "%time", and then printing the model parameters.

The output that I got for training is:
CPU times: user 2.45 ms, sys: 1.06 ms, total: 3.51 ms Wall time: 1.76 ms
model.itercept: [-33.08987216]
model.coeffiecent: [[ 14.75218964 -14.87575477]]
number of iterations: [12]
1. plt.figure(figsize=(106))
2. plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
3. plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
4. plt.legend()
5. x1_min, x1_max = X[:,0].min(), X[:,0].max(),
6. x2_min, x2_max = X[:,1].min(), X[:,1].max(),
7. xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))
8. grid = np.c_[xx1.ravel(), xx2.ravel()]
9. probs = model.predict(grid).reshape(xx1.shape)
10. plt.contour(xx1, xx2, probs, [0.5], linewidths=1, colors='black');
The above code, lets us visulaize the regression line with respect to the input data.

1. pred = model.predict(X[1:2])
2. print(pred)
In the above code, we predict that the class of the sample X[1:2], and the class result out to be [0], which is correct

LR_Sklearn.py
1. %matplotlib inline
2. import numpy as np
3. import matplotlib.pyplot as plt
4. import seaborn as sns
5. from sklearn import datasets
6. from sklearn.linear_model import LogisticRegression
7.
9.
10. X = iris.data[:, :2]
11. y = (iris.target != 0) * 1
12.
13. plt.figure(figsize=(106))
14. plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
15. plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
16. plt.legend();
17.
18. model = LogisticRegression(C=1e2)
19. %time model.fit(X, y)
20. print(model.intercept_, model.coef_,model.n_iter_)
21.
22. plt.figure(figsize=(106))
23. plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
24. plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
25. plt.legend()
26. x1_min, x1_max = X[:,0].min(), X[:,0].max(),
27. x2_min, x2_max = X[:,1].min(), X[:,1].max(),
28. xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))
29. grid = np.c_[xx1.ravel(), xx2.ravel()]
30. probs = model.predict(grid).reshape(xx1.shape)
31. plt.contour(xx1, xx2, probs, [0.5], linewidths=1, colors='black');

### 2. Using Numpy

1. %matplotlib inline
2. import numpy as np
3. import matplotlib.pyplot as plt
4. import seaborn as sns
5. from sklearn import datasets  a
In the above code, we are importing the required libraries, that we will be using.
1. X = iris.data[:, :2]
2. y = (iris.target != 0) * 1
Manipulating and pre-processing the data, so that it can be fed to the model.
1. plt.figure(figsize=(106))
2. plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
3. plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
4. plt.legend();
Let us try to visualize the imported data

1. class LogisticRegression:
2.     def __init__(self, lr=0.01, num_iter=100000, fit_intercept=True, verbose=False):
3.         self.lr = lr
4.         self.num_iter = num_iter
5.         self.fit_intercept = fit_intercept
6.         self.verbose = verbose
7.
9.         intercept = np.ones((X.shape[0], 1))
10.         return np.concatenate((intercept, X), axis=1)
11.
12.     def __sigmoid(self, z):
13.         return 1 / (1 + np.exp(-z))
14.     def __loss(self, h, y):
15.         return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
16.
17.     def fit(self, X, y):
18.         if self.fit_intercept:
20.
21.         # weights initialization
22.         self.theta = np.zeros(X.shape[1])
23.
24.         for i in range(self.num_iter):
25.             z = np.dot(X, self.theta)
26.             h = self.__sigmoid(z)
27.             gradient = np.dot(X.T, (h - y)) / y.size
28.             self.theta -= self.lr * gradient
29.
30.             z = np.dot(X, self.theta)
31.             h = self.__sigmoid(z)
32.             loss = self.__loss(h, y)
33.
34.             if(self.verbose ==True and i % 10000 == 0):
35.                 print(f'loss: {loss} \t')
36.
37.     def predict_prob(self, X):
38.         if self.fit_intercept:
40.
41.         return self.__sigmoid(np.dot(X, self.theta))
42.
43.     def predict(self, X):
44.         return self.predict_prob(X).round()
In the above code, we created a user-defined class "LogisticRegression", which contain all the methods required to result out the desired regression line
• __init__ - constructor to initialize all the required variables with default values or initial values
• __add_intercept - to find the model intercept value
• __sigmoid - function to return sigmoid curve
• __loss - function to return loss
• fit - function to calculate and return the regression line
• predict_prob - helper function used for prediction
• predict - function to return predicted value
1. model = LogisticRegression(lr=0.1, num_iter=3000)
2. %time model.fit(X, y)
In the above code, we intiantiate the LogisticRegression class, and then provide 'X' and 'y' as the parameters to the fit function to result out the desired regression line
1. preds = model.predict(X[1:2])
2. print(preds)
In the above code, we ask the model to tell the class to which our sample X[1:2] belong and the result that we get is [0.], which is correte
1. print(model.theta)
Now we print the parameter values of the resulted model.

Parameter values of my model are :
[-1.44894305 4.25546329 -6.89489245]
1. plt.figure(figsize=(106))
2. plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
3. plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
4. plt.legend()
5. x1_min, x1_max = X[:,0].min(), X[:,0].max(),
6. x2_min, x2_max = X[:,1].min(), X[:,1].max(),
7. xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))
8. grid = np.c_[xx1.ravel(), xx2.ravel()]
9. probs = model.predict_prob(grid).reshape(xx1.shape)
10. plt.contour(xx1, xx2, probs, [0.5], linewidths=1, colors='black');
The above code, will provide us a visualization of the generated regression line with respect to the input data.

LR_NumPy.py
1. %matplotlib inline
2. import numpy as np
3. import matplotlib.pyplot as plt
4. import seaborn as sns
5. from sklearn import datasets
6.
8.
9. X = iris.data[:, :2]
10. y = (iris.target != 0) * 1
11. plt.figure(figsize=(106))
12. plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
13. plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
14. plt.legend();
15.
16. class LogisticRegression:
17.     def __init__(self, lr=0.01, num_iter=100000, fit_intercept=True, verbose=False):
18.         self.lr = lr
19.         self.num_iter = num_iter
20.         self.fit_intercept = fit_intercept
21.         self.verbose = verbose
22.
24.         intercept = np.ones((X.shape[0], 1))
25.         return np.concatenate((intercept, X), axis=1)
26.
27.     def __sigmoid(self, z):
28.         return 1 / (1 + np.exp(-z))
29.     def __loss(self, h, y):
30.         return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
31.
32.     def fit(self, X, y):
33.         if self.fit_intercept:
35.
36.         # weights initialization
37.         self.theta = np.zeros(X.shape[1])
38.
39.         for i in range(self.num_iter):
40.             z = np.dot(X, self.theta)
41.             h = self.__sigmoid(z)
42.             gradient = np.dot(X.T, (h - y)) / y.size
43.             self.theta -= self.lr * gradient
44.
45.             z = np.dot(X, self.theta)
46.             h = self.__sigmoid(z)
47.             loss = self.__loss(h, y)
48.
49.             if(self.verbose ==True and i % 10000 == 0):
50.                 print(f'loss: {loss} \t')
51.
52.     def predict_prob(self, X):
53.         if self.fit_intercept:
55.
56.         return self.__sigmoid(np.dot(X, self.theta))
57.
58.     def predict(self, X):
59.         return self.predict_prob(X).round()
60.
61. model = LogisticRegression(lr=0.1, num_iter=3000)
62. %time model.fit(X, y)
63.
64. preds = model.predict(X[1:2])
65. print(preds)
66.
67. print(model.theta)
68. plt.figure(figsize=(106))
69. plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b', label='0')
70. plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r', label='1')
71. plt.legend()
72. x1_min, x1_max = X[:,0].min(), X[:,0].max(),
73. x2_min, x2_max = X[:,1].min(), X[:,1].max(),
74. xx1, xx2 = np.meshgrid(np.linspace(x1_min, x1_max), np.linspace(x2_min, x2_max))
75. grid = np.c_[xx1.ravel(), xx2.ravel()]
76. probs = model.predict_prob(grid).reshape(xx1.shape)
77. plt.contour(xx1, xx2, probs, [0.5], linewidths=1, colors='black');

### 3. Using TensorFlow

1. from __future__ import print_function
2.
3. import tensorflow as tf
In the above code, we are importing the required libraries
1. # Import MNIST data
2. from tensorflow.examples.tutorials.mnist import input_data
Using the TensorFlow's dataset library, we are importing MNIST dataset
1. # Parameters
2. learning_rate = 0.01
3. training_epochs = 100
4. batch_size = 100
5. display_step = 50
In the above code, we are assigning the values to all the global parameters.
1. # tf Graph Input
2. x = tf.placeholder(tf.float32, [None784]) # mnist data image of shape 28*28=784
3. y = tf.placeholder(tf.float32, [None10]) # 0-9 digits recognition => 10 classes
4.
5. # Set model weights
6. W = tf.Variable(tf.zeros([78410]))
7. b = tf.Variable(tf.zeros([10]))
Here we are setting X and Y as the actual training data and the W and b as the trainable data, where:
• W means Weight
• b means bais
• X means the dependent variable
• Y means the independent variable
1. # Construct model
2. pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax
3.
4. # Minimize error using cross entropy
5. cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
8.
9. # Initialize the variables (i.e. assign their default value)
10. init = tf.global_variables_initializer()
In the above code, we are
• setting the model using the Softmax method
• cost calculation will be based on the reduced mean method
• the optimizer is chosen to be Gradient Descent which minimizes cost
• the variable initializer is chosen to be a global variable initializer
1. # Start training
2. with tf.Session() as sess:
3.
4.   # Run the initializer
5.   sess.run(init)
6.
7.   # Training cycle
8.   for epoch in range(training_epochs):
9.     avg_cost = 0.
10.     total_batch = int(mnist.train.num_examples/batch_size)
11.     # Loop over all batches
12.     for i in range(total_batch):
13.         batch_xs, batch_ys = mnist.train.next_batch(batch_size)
14.         # Run optimization op (backprop) and cost op (to get loss value)
15.         _, c = sess.run([optimizer, cost], feed_dict={x: batch_xs,
16.                                                       y: batch_ys})
17.         # Compute average loss
18.         avg_cost += c / total_batch
19.     # Display logs per epoch step
20.     if (epoch+1) % display_step == 0:
21.         print("Epoch:"'%04d' % (epoch+1), "cost=""{:.9f}".format(avg_cost))
22.   training_cost = sess.run(cost, feed_dict ={x: batch_xs,
23.                                                       y: batch_ys})
24.   weight = sess.run(W)
25.   bias = sess.run(b)
In the above code, we start training and cost is printed after every 50 epochs. Here since the number of data points are large and processing in one go may cause crashes, so we train in batches.
1. print("W",weight,"\nb",bias)
2. eq= tf.math.sigmoid((tf.matmul(x, weight) + bias))
In the above code, we print the weight and bias of the learned model and then form the Logistic Regression Equation.

LR_tensorflow.py
1. from __future__ import print_function
2.
3. import tensorflow as tf
4.
5. # Import MNIST data
6. from tensorflow.examples.tutorials.mnist import input_data
8.
9. # Parameters
10. learning_rate = 0.01
11. training_epochs = 100
12. batch_size = 100
13. display_step = 50
14.
15. # tf Graph Input
16. x = tf.placeholder(tf.float32, [None784]) # mnist data image of shape 28*28=784
17. y = tf.placeholder(tf.float32, [None10]) # 0-9 digits recognition => 10 classes
18.
19. # Set model weights
20. W = tf.Variable(tf.zeros([78410]))
21. b = tf.Variable(tf.zeros([10]))
22.
23. # Construct model
24. pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax
25.
26. # Minimize error using cross entropy
27. cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
30.
31. # Initialize the variables (i.e. assign their default value)
32. init = tf.global_variables_initializer()
33.
34. # Start training
35. with tf.Session() as sess:
36.
37.   # Run the initializer
38.   sess.run(init)
39.
40.   # Training cycle
41.   for epoch in range(training_epochs):
42.     avg_cost = 0.
43.     total_batch = int(mnist.train.num_examples/batch_size)
44.     # Loop over all batches
45.     for i in range(total_batch):
46.         batch_xs, batch_ys = mnist.train.next_batch(batch_size)
47.         # Run optimization op (backprop) and cost op (to get loss value)
48.         _, c = sess.run([optimizer, cost], feed_dict={x: batch_xs,
49.                                                       y: batch_ys})
50.       &nbstsp; # Compute average loss
51.         avg_cost += c / total_batch
52.     # Display logs per epoch step
53.     if (epoch+1) % display_step == 0:
54.         print("Epoch:"'%04d' % (epoch+1), "cost=""{:.9f}".format(avg_cost))
55.   training_cost = sess.run(cost, feed_dict ={x: batch_xs,
56.                                                       y: batch_ys})
57.   weight = sess.run(W)
58.   bias = sess.run(b)
59.
60. print("W",weight,"\nb",bias)
61. eq= tf.math.sigmoid((tf.matmul(x, weight) + bias))
The output that i got is

## Conclusion

In this chapter, we studied simple logistic regression.

In the next chapter, we will learn about Multiple Linear Regression.
Author
73 25.6k 2.8m
Next » Machine Learning: Multiple Linear Regression