Machine Learning  

The ABCs of Machine Learning

AI… it’s one of those things that sounds super techy and, honestly, a bit confusing. Welcome to AI for Dummies, where I’ll walk you through the world of Artificial Intelligence without all the jargon and over-complicated stuff.

This is Part 2: The ABCs of Machine Learning

Back in Part 1: Layers of Artificial Intelligence, we talked about AI as a whole and the different levels that make it up. Now it’s time to learn the top-most layer of all: Machine Learning (ML).

Think of it like this: if AI is a car, then Machine Learning is the engine under the hood. It’s the part that gives power and makes a lot of today’s cool AI applications actually work.

1. What Is Machine Learning?

Machine learning is basically a branch of AI that helps computers get better at tasks by learning from data, kind of like how we humans learn from experience. Instead of us writing down every single rule for the computer to follow, we just feed it a bunch of examples, and the system figures out the rules on its own.

For example, let’s say we want a computer to recognize cats in pictures. Instead of programming it with step-by-step instructions like “look for whiskers, pointy ears, or tails,” we just give it thousands of cat photos. Over time, the algorithm learns the patterns that make a cat a cat, and then it can spot them on its own in new images.

Rikam Palkar Machine Learning Process

2. Training Data

Garbage In, Garbage Out

Say you wanna bake a cake with spoiled eggs or expired milk, it doesn’t matter how good your recipe is, the cake’s going to taste awful. Machine Learning works the same way. If the data you feed it is messy, incomplete, or just plain wrong, the results will also be off.

Garbage in Garbage out ML

That’s why people often say, “garbage in, garbage out.” No matter how advanced your algorithm is, if the input data is bad, the predictions will be bad too.

Labeled vs. Unlabeled Data

When training a Machine Learning model, you’ll run into different types of data. But before we jump into the details, it’s important to know that data usually comes in two main forms: labeled and unlabeled.

Rikam Palkar Label and Unlabeled data
  • Labeled Data: This is data that already comes with the “right answer” attached. Like flashcards, on one side you see the question (like a picture of a fruit), and on the other side, the answer is written (“apple,” “banana,” or “orange”). That way, the model knows exactly what each example is.

  • Unlabeled Data: Here, there are no answers provided. The model has to look for patterns on its own. For example, if I toss my entire wedding album into the training data without adding any tags, the algorithm won’t know what’s what. But it can still pick up on patterns like colors, shapes, or textures and group similar photos together, even without knowing who’s in them or what the event was. That way I can have my engagement album separated from wedding album. Wish I had done that.

There’s also something in between called semi-supervised learning. Here, only part of the data is labeled. The algorithm uses the labeled examples to learn the basics, then applies that knowledge to make sense of the unlabeled data.

Structured vs. Unstructured Data

Not all data comes in the same shape and form; some of it’s nice and organized, while some of it’s a total mess - like the Crypto market, LOL.

Rikam Palkar Structured Learning
  • Structured Data: This is the neat stuff, usually organized into rows and columns like a spreadsheet. It’s the kind of data that classic ML algorithms love to work with.

  1. Tabular Data: A banking database with customer details, account number, balance, and transaction history.

  2. Time-Series Data: Fitness tracker readings, like your daily step count or heart rate over time.

  • Unstructured Data: This is the messy kind that doesn’t fit neatly into tables. It usually needs more advanced ML techniques to understand.

  1. Text Data: Tweets, product reviews, blog posts.

  2. Image Data: Photos, X-rays, or video frames.

  3. Audio Data: Music, podcasts, or voice recordings.

3. How Machines Learn

Once your data is prepped, the next step is running it through algorithms. Most ML approaches fall into three main buckets:

Rikam Palkar ML Supervised Learning
  • Supervised Learning: You train the model with labeled data, basically, examples where both the input and the correct answer are already known. The goal is for the model to learn on new, unseen data and predict the right output.

  • Unsupervised Learning: No labels are given. No answers attached. Its job is to dig through the data and uncover hidden patterns, structures, or relationships.

  • Reinforcement Learning: The model learns through trial and error, getting “rewards” for good moves and “penalties” for bad ones. Like training a pet, it tries things, learns from the “good job” or “nope,” and gradually improves its decision-making.

  • Semi-Supervised Learning: This is a mix of the two worlds. Only part of the training data is labeled, and the algorithm uses those labeled examples to help make sense of the larger pool of unlabeled data.

4. Inferencing: Putting the Model to Work

Once your model is trained, it’s ready to use what it’s learned. This step is called inferencing, basically just a fancy word for making predictions.

Rikam Palkar Inferencing

There are two common ways to do it:

Batch Inferencing: Here, the model processes a big chunk of data all at once. For example, analyzing thousands of medical scans overnight to flag potential issues. Batch is great when accuracy matters more than speed, since you’re not in a rush for instant answers.

Real-Time Inferencing: In this case, the model makes decisions on the fly as new data comes in. Think of a fraud detection system spotting a suspicious credit card transaction instantly, or a self-driving car deciding when to hit the brakes. Here, speed is critical.

Both methods are valuable; it just depends on whether you care more about depth (batch) or speed (real-time).

Time to put my cloud knowledge to the test! Here are the services currently used in AWS, Azure, and Google Cloud as of this article’s publishing.

1. Data Preparation & Management

ServiceAWSAzureGoogle Cloud
Data StorageAmazon S3Azure Blob StorageGoogle Cloud Storage
Data LabelingAmazon SageMaker Ground TruthAzure Machine Learning Data LabelingGoogle Cloud Data Labeling Service
Data ProcessingAWS GlueAzure Data FactoryGoogle Cloud Dataflow

2. Model Training

ServiceAWSAzureGoogle Cloud
Managed ML PlatformAmazon SageMakerAzure Machine LearningVertex AI
Training InfrastructureEC2 Instances, SageMaker TrainingAzure ML ComputeGoogle Cloud AI Platform Training
Pre-built ModelsSageMaker JumpStartAzure AI GalleryVertex AI Workbench

3. Model Evaluation & Tuning

ServiceAWSAzureGoogle Cloud
Hyperparameter TuningSageMaker Automatic Model TuningAzure HyperDriveVertex AI Hyperparameter Tuning
Model ExplainabilitySageMaker ClarifyAzure ML InterpretabilityVertex AI Explainable AI

4. Model Deployment & Inferencing

ServiceAWSAzureGoogle Cloud
Real-Time InferenceSageMaker EndpointsAzure ML EndpointsVertex AI Endpoints
Batch InferenceSageMaker Batch TransformAzure ML Batch InferenceVertex AI Batch Prediction
Edge DeploymentSageMaker NeoAzure IoT EdgeVertex AI Edge

5. Model Monitoring & Management

ServiceAWSAzureGoogle Cloud
Model MonitoringSageMaker Model MonitorAzure ML Model MonitoringVertex AI Model Monitoring
Drift DetectionSageMaker ClarifyAzure ML Data DriftVertex AI Data Drift

Summary

That’s Machine Learning in a nutshell, it’s basically teaching computers with examples instead of nagging them with rules.

In the next article, we’ll peel back another layer and talk about Deep Learning, the part of AI that gives us things like facial recognition, voice assistants, and those “how did Netflix know I’d watch this?” moments.