π€ What is KNN Algorithm?
The k-Nearest Neighbors (KNN) algorithm is a supervised learning method used for both classification and regression tasks. It works on a simple principle:
βA data point is classified based on how its neighbors are classified.β
In other words, KNN predicts the label of a new data point by looking at the majority class of its nearest neighbors.
Example: If most of the neighbors are "cats," the new data point is likely to be a "cat."
βοΈ How Does KNN Work?
The KNN algorithm follows these main steps:
Choose the number of neighbors (k).
Calculate the distance between the new data point and all training data points.
Select the k-nearest neighbors based on the smallest distances.
Classify (for classification) β Assign the majority label among the neighbors.
Predict (for regression) β Take the average of neighbor values.
π Distance Metrics in KNN
The performance of KNN depends on how we measure the "closeness" of data points. Common distance metrics are:
Euclidean Distance: Straight-line distance between two points.
Manhattan Distance: Sum of absolute differences (like grid movement).
Minkowski Distance: Generalization of Euclidean and Manhattan.
Hamming Distance: Used for categorical data.
π₯οΈ Example in Python (Classification)
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create KNN model
knn = KNeighborsClassifier(n_neighbors=5)
# Train model
knn.fit(X_train, y_train)
# Predictions
y_pred = knn.predict(X_test)
# Accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))
β
Output will show how accurate KNN is on the Iris dataset.
π Real-World Applications of KNN
π Recommendation Systems β suggesting similar items.
π₯ Medical Diagnosis β classifying patients based on symptoms.
π Image Recognition β face recognition and object detection.
π§ Spam Detection β classifying emails as spam or not.
β
Advantages of KNN
Simple and easy to understand.
Works well with small datasets.
No assumptions about data distribution.
Can be used for both classification and regression.
β Limitations of KNN
Slow for large datasets (computes distance for every new query).
Sensitive to noisy data and irrelevant features.
Choosing the right value of k is tricky.
Performance depends heavily on scaling of data.
π Choosing the Right Value of K
If k is too small β Model becomes sensitive to noise (overfitting).
If k is too large β Model may oversimplify (underfitting).
A common practice is to use cross-validation to find the optimal k.
π Conclusion
The k-Nearest Neighbors (KNN) algorithm is a powerful, simple, and intuitive method for both classification and regression tasks. Although it has limitations with large datasets and high dimensions, it is still a great starting point for beginners in machine learning with Python.
If youβre new to ML, KNN is one of the best algorithms to implement and experiment with! π