House Price Prediction By Using Machine Learning


Logistic Regression is a part of Supervised Learning method of Machine Learning. It is a statistical method for the analysis of a dataset. It has one or more independent variables that determine an outcome. There is one basic difference between Linear Regression and Logistic Regression  which is that Linear Regression's outcome is continuous whereas Logistic Regression's outcome is only limited. Here, the outcome represents a dependent variable.

I will not go into details about Logistic Regression. I will explain to predict the house price based on some features of the house by using Logistic Regerssion.

Features of a House

House price will be predicted by using the below features of a house.
  • Year Built
  • Total Basement in Sqr. Ft.
  • Lot Area
  • Floor Area
  • Over all condition
  • Lot Frontage
  • Garage details
  • Detail about fire place
  • .......
We will have two types of data,
  • Training Data - This data will contain the information related to the Year Sold and Sale Price of House.
  • Test Data - It will contain all the information about a house. And, based on all the given information, Logistic Regression Algorithm will predict the selling price of a house.

It will be implemetented in Python. The below libraries or models will be required to import.
  1. #logistic regression model import  
  2. from sklearn.metrics import accuracy_score  
  3. from sklearn.linear_model import LogisticRegression  
Read training data.
  1. #reading training File  
  2. tr=open("train_data.csv","r")  
  3. records=tr.readlines()  
  4. tr.close()  
Make training set vectors.
  1. #Making training set X and y vectors  
  2. X=[[] for i in range(1460)]  
  3. y=[]  
  4. for i in range(1,len(records)):  
  5.     for j in range(len(records[i].strip().split(","))-1):  
  6.         X[i - 1].append(int(records[i].strip().split(",")[j]))  
  7.     y.append(int(records[i].strip().split(",")[36]))  
Training set Logistic Regression Model.
  1. #training our logistic regression model  
  2. lr = LogisticRegression()  
Read testing data for which the prediction will be performed.
  1. #reading testing set file  
  2. te=open("test_data.csv","r")  
  3. records1=te.readlines()  
  4. te.close()  
Create testing vector.
  1. #Making testing set X vector  
  2. XX=[[] for i in range(1459)]  
  3. yy=[]  
  4. for i in range(1,len(records1)):  
  5.     for j in range(len(records1[i].strip().split(","))):  
  6.         XX[i - 1].append(int(records1[i].strip().split(",")[j]))  
Now, predict by using Logistic Regression.
  1. yy = lr.predict(XX)  
Write the prediction result to a new CSV file.
  1. # writing predicted house price to new file  
  2. result=open("predictionresult.csv","w")  
  3. print("Writing to File")  
  4. result.write("House No,Predicted Price" + "\n")  
  5. for i in range(len(yy)):  
  6.     result.write(str(i+1) + "," + str(yy[i]) + "\n")  
  7. result.close()  
Checking of accuracy.
  1. #Checking for model accuracy by applying model on training set  
  2. yyy = lr.predict(X)  
  3. accuracy = accuracy_score(y,yyy)*100  
  4. print ("model accuracy")  
  5. print(accuracy)  
It will predict the house price like below.

House NoPredicted Price

It will give the model accuracy like below,

model accuracy 68.3561643836

Code Execution Details

I have attached the zipped Python code of the training and test CSV data. Python 3.0 or above should be installed. It will write the prediction result into result.csv file. Please make sure that you have all libraries installed mentioned in the header. 


Logistic Regression is very good part of Machine Learning. It is used in various fields, like medical, banking, social science, etc. It can predict the value based on the training dataset. Training dataset defines it accurately.