AI… it’s one of those things that sounds super techy and, honestly, a bit confusing. Welcome to AI for Dummies, where I’ll walk you through the world of Artificial Intelligence without all the jargon and over-complicated stuff.
This is Part 2: The ABCs of Machine Learning
Back in Part 1: Layers of Artificial Intelligence, we talked about AI as a whole and the different levels that make it up. Now it’s time to learn the top-most layer of all: Machine Learning (ML).
Think of it like this: if AI is a car, then Machine Learning is the engine under the hood. It’s the part that gives power and makes a lot of today’s cool AI applications actually work.
1. What Is Machine Learning?
Machine learning is basically a branch of AI that helps computers get better at tasks by learning from data, kind of like how we humans learn from experience. Instead of us writing down every single rule for the computer to follow, we just feed it a bunch of examples, and the system figures out the rules on its own.
For example, let’s say we want a computer to recognize cats in pictures. Instead of programming it with step-by-step instructions like “look for whiskers, pointy ears, or tails,” we just give it thousands of cat photos. Over time, the algorithm learns the patterns that make a cat a cat, and then it can spot them on its own in new images.
![Rikam Palkar Machine Learning Process]()
2. Training Data
Garbage In, Garbage Out
Say you wanna bake a cake with spoiled eggs or expired milk, it doesn’t matter how good your recipe is, the cake’s going to taste awful. Machine Learning works the same way. If the data you feed it is messy, incomplete, or just plain wrong, the results will also be off.
![Garbage in Garbage out ML]()
That’s why people often say, “garbage in, garbage out.” No matter how advanced your algorithm is, if the input data is bad, the predictions will be bad too.
Labeled vs. Unlabeled Data
When training a Machine Learning model, you’ll run into different types of data. But before we jump into the details, it’s important to know that data usually comes in two main forms: labeled and unlabeled.
![Rikam Palkar Label and Unlabeled data]()
Labeled Data: This is data that already comes with the “right answer” attached. Like flashcards, on one side you see the question (like a picture of a fruit), and on the other side, the answer is written (“apple,” “banana,” or “orange”). That way, the model knows exactly what each example is.
Unlabeled Data: Here, there are no answers provided. The model has to look for patterns on its own. For example, if I toss my entire wedding album into the training data without adding any tags, the algorithm won’t know what’s what. But it can still pick up on patterns like colors, shapes, or textures and group similar photos together, even without knowing who’s in them or what the event was. That way I can have my engagement album separated from wedding album. Wish I had done that.
There’s also something in between called semi-supervised learning. Here, only part of the data is labeled. The algorithm uses the labeled examples to learn the basics, then applies that knowledge to make sense of the unlabeled data.
Structured vs. Unstructured Data
Not all data comes in the same shape and form; some of it’s nice and organized, while some of it’s a total mess - like the Crypto market, LOL.
![Rikam Palkar Structured Learning]()
Tabular Data: A banking database with customer details, account number, balance, and transaction history.
Time-Series Data: Fitness tracker readings, like your daily step count or heart rate over time.
Text Data: Tweets, product reviews, blog posts.
Image Data: Photos, X-rays, or video frames.
Audio Data: Music, podcasts, or voice recordings.
3. How Machines Learn
Once your data is prepped, the next step is running it through algorithms. Most ML approaches fall into three main buckets:
![Rikam Palkar ML Supervised Learning]()
Supervised Learning: You train the model with labeled data, basically, examples where both the input and the correct answer are already known. The goal is for the model to learn on new, unseen data and predict the right output.
Unsupervised Learning: No labels are given. No answers attached. Its job is to dig through the data and uncover hidden patterns, structures, or relationships.
Reinforcement Learning: The model learns through trial and error, getting “rewards” for good moves and “penalties” for bad ones. Like training a pet, it tries things, learns from the “good job” or “nope,” and gradually improves its decision-making.
Semi-Supervised Learning: This is a mix of the two worlds. Only part of the training data is labeled, and the algorithm uses those labeled examples to help make sense of the larger pool of unlabeled data.
4. Inferencing: Putting the Model to Work
Once your model is trained, it’s ready to use what it’s learned. This step is called inferencing, basically just a fancy word for making predictions.
![Rikam Palkar Inferencing]()
There are two common ways to do it:
Batch Inferencing: Here, the model processes a big chunk of data all at once. For example, analyzing thousands of medical scans overnight to flag potential issues. Batch is great when accuracy matters more than speed, since you’re not in a rush for instant answers.
Real-Time Inferencing: In this case, the model makes decisions on the fly as new data comes in. Think of a fraud detection system spotting a suspicious credit card transaction instantly, or a self-driving car deciding when to hit the brakes. Here, speed is critical.
Both methods are valuable; it just depends on whether you care more about depth (batch) or speed (real-time).
Time to put my cloud knowledge to the test! Here are the services currently used in AWS, Azure, and Google Cloud as of this article’s publishing.
1. Data Preparation & Management
Service | AWS | Azure | Google Cloud |
---|
Data Storage | Amazon S3 | Azure Blob Storage | Google Cloud Storage |
Data Labeling | Amazon SageMaker Ground Truth | Azure Machine Learning Data Labeling | Google Cloud Data Labeling Service |
Data Processing | AWS Glue | Azure Data Factory | Google Cloud Dataflow |
2. Model Training
Service | AWS | Azure | Google Cloud |
---|
Managed ML Platform | Amazon SageMaker | Azure Machine Learning | Vertex AI |
Training Infrastructure | EC2 Instances, SageMaker Training | Azure ML Compute | Google Cloud AI Platform Training |
Pre-built Models | SageMaker JumpStart | Azure AI Gallery | Vertex AI Workbench |
3. Model Evaluation & Tuning
Service | AWS | Azure | Google Cloud |
---|
Hyperparameter Tuning | SageMaker Automatic Model Tuning | Azure HyperDrive | Vertex AI Hyperparameter Tuning |
Model Explainability | SageMaker Clarify | Azure ML Interpretability | Vertex AI Explainable AI |
4. Model Deployment & Inferencing
Service | AWS | Azure | Google Cloud |
---|
Real-Time Inference | SageMaker Endpoints | Azure ML Endpoints | Vertex AI Endpoints |
Batch Inference | SageMaker Batch Transform | Azure ML Batch Inference | Vertex AI Batch Prediction |
Edge Deployment | SageMaker Neo | Azure IoT Edge | Vertex AI Edge |
5. Model Monitoring & Management
Service | AWS | Azure | Google Cloud |
---|
Model Monitoring | SageMaker Model Monitor | Azure ML Model Monitoring | Vertex AI Model Monitoring |
Drift Detection | SageMaker Clarify | Azure ML Data Drift | Vertex AI Data Drift |
Summary
That’s Machine Learning in a nutshell, it’s basically teaching computers with examples instead of nagging them with rules.
In the next article, we’ll peel back another layer and talk about Deep Learning, the part of AI that gives us things like facial recognition, voice assistants, and those “how did Netflix know I’d watch this?” moments.