Batch And Online Machine Learning

One of the criterion used to classify Machine Learning systems is whether or not the system can learn incrementally from a stream of incoming data.

Batch Learning

In batch learning, the system is incapable of learning incrementally: It must be trained using all the available data. This will generally take a lot of time and computing resources, so it is typically done offline, first the system is trained and then it’s launched into production and runs without learning anymore; it just applied what it has learned. This is called offline learning.
If you wish a batch learning system to know about new data, (such as a new type of spam), you will have to train a new version of the system from scratch on the full dataset (both new data and old data). The stop the old system and replace it with the new one.
Fortunately, the whole process of training, evaluation, and launching a Machine Learning system can be automated fairly easily so even a batch learning system can adapt to change. Simple update the data and train a new version of the system from scratch as often as needed.
This solution is simple and often works fine, but training using the full set of data can take many hours and may not be a part of best practice. So, you would typically train a new system only every 24 hrs or just weekly. If your system needs to adapt to rapidly changing data then you need a more tractive solution.
Also, training on the full set of data requires a lot of computing resources (CPU, memory space, disk space, disk I/O, network I/O, etc.).If you have a lot of data and you automate your systems to train from scratch every day, it will end up costing you a lot of money. If the amount of data is huge, it may even be impossible to use a batch learning algorithm.
Finally, if the system needs to be able to learn autonomously and it has limited resources (e.g. a smartphone application or a rover on Mars). Then carrying around large amounts of training data taking it a lot of resources to train for hours every day is a showstopper.
So, a better option in all these cases is to use algorithms that are capable of learning incrementally.

Online Learning

In online learning, we train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches. Each learning step is cheap and fast, so the system can learn about new data.
Online learning is great for systems that receives data as a continuous flow (e.g., stock prices) and needs to adapt to change rapidly or autonomously. It is also a good option if you have limited computing resources: Once an online learning system has learned about new data instances, it does not need them anymore, so you can discard them (unless you to be able to roll back to a previous state and “replay” the data). This can save a huge amount of space.
Online learning algorithms may also be used to train systems on huge datasets that cannot fit in one machine’s main memory which is called out-of-core learning. This algorithm loads part of the data, runs a training step on that data, and repeats the process until it has run on all of the data.
One important parameter of online learning systems is how fast they should adapt to changing data: This is called the learning rate. If you set a high learning rate, then the system will rapidly adapt to new data. But it will also tend to quickly forget the old data and you don’t want a spam filter to flag only the latest kinds of spam it was shown. Conversely, if you set a low learning rate then the system will have more inertia, that is it will learn slowly, but it will also be less sensitive to noise in the new data or to sequences of non-representative’s data points.
A big challenge with online learning is that if bad data is fed to the system, the system performances will gradually decline. If we are talking about the live system, your clients will notice. For example, bad data may come from a malfunctioning sensor on a robot, or from someone spamming a search engine to try to rank high in search results. To reduce this risk, you need to monitor the systems closely and promptly switch learning off and possibly you want to revert to a previous working state if you detect a drop-in performance. You may also want to monitor the input data and react to abnormal data (e.g. using an anomaly detecting algorithm).