House Price Prediction By Using Machine Learning

Introduction

 
Logistic Regression is a part of the Supervised Learning method of Machine Learning. It is a statistical method for the analysis of a dataset. It has one or more independent variables that determine an outcome. There is one basic difference between Linear Regression and Logistic Regression  which is that Linear Regression's outcome is continuous whereas Logistic Regression's outcome is only limited. Here, the outcome represents a dependent variable.
 
I will not go into detail about Logistic Regression. I will explain to predict the house price based on some features of the house by using Logistic Regression.
 
Features of a House
 
House prices will be predicted by using the below features of a house.
  • Year Built
  • Total Basement in Sqr. Ft.
  • Lot Area
  • Floor Area
  • Overall condition
  • Lot Frontage
  • Garage details
  • Detail about fireplace
  • .......
We will have two types of data,
  • Training Data - This data will contain the information related to the Year Sold and Sale Price of House.
  • Test Data - It will contain all the information about a house. And, based on all the given information, Logistic Regression Algorithm will predict the selling price of a house.
Implementation
 
It will be implemetented in Python. The below libraries or models will be required to import.
  1. #logistic regression model import  
  2. from sklearn.metrics import accuracy_score  
  3. from sklearn.linear_model import LogisticRegression  
Read training data.
  1. #reading training File  
  2. tr=open("train_data.csv","r")  
  3. records=tr.readlines()  
  4. tr.close()  
Make training set vectors.
  1. #Making training set X and y vectors  
  2. X=[[] for i in range(1460)]  
  3. y=[]  
  4. for i in range(1,len(records)):  
  5.     for j in range(len(records[i].strip().split(","))-1):  
  6.         X[i - 1].append(int(records[i].strip().split(",")[j]))  
  7.     y.append(int(records[i].strip().split(",")[36]))  
Training set Logistic Regression Model.
  1. #training our logistic regression model  
  2. lr = LogisticRegression()  
  3. lr.fit(X,y)  
Read testing data for which the prediction will be performed.
  1. #reading testing set file  
  2. te=open("test_data.csv","r")  
  3. records1=te.readlines()  
  4. te.close()  
Create testing vector.
  1. #Making testing set X vector  
  2. XX=[[] for i in range(1459)]  
  3. yy=[]  
  4. for i in range(1,len(records1)):  
  5.     for j in range(len(records1[i].strip().split(","))):  
  6.         XX[i - 1].append(int(records1[i].strip().split(",")[j]))  
Now, predict by using Logistic Regression.
  1. yy = lr.predict(XX)  
Write the prediction result to a new CSV file.
  1. # writing predicted house price to new file  
  2. result=open("predictionresult.csv","w")  
  3. print("Writing to File")  
  4. result.write("House No,Predicted Price" + "\n")  
  5. for i in range(len(yy)):  
  6.     result.write(str(i+1) + "," + str(yy[i]) + "\n")  
  7. result.close()  
Checking of accuracy.
  1. #Checking for model accuracy by applying model on training set  
  2. yyy = lr.predict(X)  
  3. accuracy = accuracy_score(y,yyy)*100  
  4. print ("model accuracy")  
  5. print(accuracy)  
It will predict the house price like below.
 
House No Predicted Price
1 144000
2 157900
3 175000
4 215000
5 275000
6 178000
7 194500
8 178900
9 260000
10 167900
11 236500
12 83000
13 106000
14 148500
15 160200
16 320000
17 235128
18 230000 
 
It will give the model accuracy like below,
 
model accuracy 68.3561643836
 
Code Execution Details
 
I have attached the zipped Python code of the training and test CSV data. Python 3.0 or above should be installed. It will write the prediction result into result.csv file. Please make sure that you have all the libraries installed mentioned in the header. 
 

Conclusion

 
Logistic Regression is a very good part of Machine Learning. It is used in various fields, like medical, banking, social science, etc. It can predict the value based on the training dataset. The training dataset defines it accurately.


Similar Articles