Introduction
After setting up Python and VS Code correctly, the best way to start AI/ML is to build a small working project. In this article, I will build a beginner-friendly Machine Learning project to predict salary from years of experience.
Problem Statement
I want to predict salary based on experience using a simple ML algorithm.
What We Will Build
We will build a program that:
Loads dataset from CSV
Splits data into training/testing
Trains a Linear Regression model
Predicts salary for user input experience
Prerequisites
pip install pandas scikit-learn
Step-by-Step Setup
Step 1: Project Structure
Use this structure:
02_linear_regression/salary_prediction/
├── data/
│ └── data.csv
└── src/
└── salary_predict.py
Step 2: Create Dataset File
Create data/data.csv
Experience,Salary
1,35000
2,40000
3,50000
4,60000
5,65000
6,70000
7,80000
8,85000
9,90000
10,100000
Code
Create src/salary_predict.py
import os
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Get correct CSV path (works from anywhere)
base_dir = os.path.dirname(__file__)
csv_path = os.path.join(base_dir, "..", "data", "data.csv")
# Load dataset
df = pd.read_csv(csv_path)
# Input (X) and Output (y)
X = df[["Experience"]]
y = df["Salary"]
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=420
)
# Train model
model = LinearRegression()
model.fit(X_train, y_train)
# User input
experience = float(input("Enter years of experience: "))
# Predict
input_data = pd.DataFrame({"Experience": [experience]})
predicted_salary = model.predict(input_data)
print(f"Predicted salary for {experience} years experience = {predicted_salary[0]:.2f}")
Output
Run:
python src/salary_predict.py
Example:
Enter years of experience: 11
Predicted salary for 11.0 years experience = 106978.30
GitHub Repo Link
https://github.com/pramodsingh-ai/ai-ml-python-practicals
Summary
In this article, we built a complete beginner ML project using Linear Regression. This helped us understand:
Dataset loading
Train/Test split
Model training
Prediction