Two-Class Logistic Regression

Overview

 
Two-Class Logistic Regression module to create a logistic regression model that can be used to predict two (and only two) outcomes.
 
Logistic regression is a well-known statistical technique that is used for modeling any kind of problem.
 

How to Configure a Two-Class Logistic Regression

 
Step 1
Add the Two-Class Logistic Regression module to the experiment.
 
Step 2
Specify how you want the model to be trained, by setting the Create trainer mode option.
 
Single Parameter
If you know how you want to configure the model, you can provide a specific set of values as arguments.
 
Parameter Range If you are not sure of the best parameters, you can find the optimal parameters by specifying multiple values and using the Tune Model Hyperparameters module to find the optimal configuration. The trainer will iterate over multiple combinations of the settings you provided and determine the combination of values that produces the best model.
 
Step 3
For Optimization tolerance, specify a threshold value to use when optimizing the model. If the improvement between iterations falls below the specified threshold, the algorithm is considered to have converged on a solution, and training stops.
 
Step 4
For L1 regularization weight and L2 regularization weight, type a value to use for the regularization parameters L1 and L2. A non-zero value is recommended for both.
 
Regularization is a method for preventing overfitting by penalizing models with extreme coefficient values. Regularization works by adding the penalty that is associated with coefficient values to the error of the hypothesis. Thus, an accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less.
 
L1 and L2 regularization have different effects and use.
 
L1 can be applied to sparse models, which is useful when working with high-dimensional data.
 
In contrast, L2 regularization is preferable for data that is not sparse.
 
This algorithm supports a linear combination of L1 and L2 regularization values: that is, if x = L1 and y = L2, then ax + by = c defines the linear span of the regularization terms.
 
Step 5
· For Memory size for L-BFGS, specify the amount of memory to use for L-BFGS optimization.
· L-BFGS stands for "limited memory Broyden-Fletcher-Goldfarb-Shanno". It is an optimization algorithm that is popular for parameter estimation. This parameter indicates the number of past positions and gradients to store for the computation of the next step.
· This optimization parameter limits the amount of memory that is used to compute the next step and direction. When you specify less memory, training is faster but less accurate.
 
Step 6
For Random number seed, type an integer value. Defining a seed value is important if you want the results to be reproducible over multiple runs of the same experiment
 
Step 7
Select the Allow unknown categorical levels option to create an additional “unknown” level in each categorical column. If you do so, any values (levels) in the test dataset that are not available in the training dataset are mapped to this "unknown" level.
 
Step 8
Train the model.
 
If you set Create trainer mode to Single Parameter, connect a tagged dataset and the Train Model module.
 
If you set Create trainer mode to Parameter Range, connect a tagged dataset and train the model by using Tune Model Hyperparameters.
 
Step 9
Run the experiment.
 
Experiment with example
 

1. Dataset - Walmart-features.csv 

 
 
2. Model
 
 
3. Train Model
 
 
4. Score Model
 
 
5. Evaluation Model
 
 

Conclusion

 
The Walmart-features Dataset is an ideal dataset and theTwo-Class Logistic Regression algorithm classifies the classes with an accuracy of 93%.
 
Accuracy = (TP+TN)/(TP+TN+FP+FN)= 0.94 Type 1 Error = (FP)/(FP+TN) = 0
Type 2 Error = (FN)/(FN+TP) = 0