Linear regression example on housing data

Spread the love

In this below tutorial, we will explain about linear regression with housing data.

Step 1: we need to import the libraries and metrics

Step 2: we need to imprort the housing data

Step 3: Data pre-processing removing the uncessary variables like price, id, date

Step 4: Assisgn the price variable to Y

Step 5: Split the data into training and test set using training_test_split method.

Step 6. In this step, I am providing the data to linear Regression() algorithm. I fit and predict the values. I got 0.7044808067489784 score. I am not satisfied with this score.

Step 7. Now, I am moving to RandomforestRegressor, It will provide 500 trees with depth of 10. I feed that data to this algorithm. I got 0.9361980772317255 score. Pretty good.

Step 9. I am some what satisfied with score. Trying to better the model. I finally tried with GradientBoostingRegressor with 500 trees with depth of 10. I feed the data to this algorithm. Finally I Achieved, 0.9990719047561639.

Step 10. Plotting the graph Results.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.metrics import mean_squared_error, r2_score

data = pd.read_csv('kc_house_data.csv')

X= data.drop(['price','id','date'], axis=1)
Y= data['price']

from sklearn.cross_validation import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=0)

from sklearn.linear_model import LinearRegression

reg = LinearRegression()
reg.fit(X_train, Y_train)
Y_pred=reg.predict(X_test)

print("Linear Regression Mean squared error: %.2f" % mean_squared_error(Y_test, Y_pred))
print('Linear Regression r2 score: %.2f' % r2_score(Y_test, Y_pred))
print('Accuracy score',reg.score(X_train, Y_train))

from sklearn import ensemble
reg = ensemble.RandomForestRegressor(max_depth=10, random_state=0, n_estimators=500)

reg.fit(X_train, Y_train)
Y_pred=reg.predict(X_test)

print("RandomForestRegressor Mean squared error: %.2f" % mean_squared_error(Y_test, Y_pred))
print('RandomForestRegressor r2 score: %.2f' % r2_score(Y_test, Y_pred))
print('RandomForestRegressor Accuracy score',reg.score(X_train, Y_train))

from sklearn import ensemble
reg = ensemble.GradientBoostingRegressor(n_estimators = 500, max_depth = 10,
min_samples_split = 2,
learning_rate = 0.1, loss = 'ls')

reg.fit(X_train, Y_train)
Y_pred=reg.predict(X_test)

print("GradientBoostingRegressor Mean squared error: %.2f" % mean_squared_error(Y_test, Y_pred))
print('GradientBoostingRegressor r2 score: %.2f' % r2_score(Y_test, Y_pred))
print('GradientBoostingRegressor Accuracy score',reg.score(X_train, Y_train))

plt.scatter(Y_test[:20], Y_pred[:20], color='black')
plt.plot(Y_test[:20], Y_pred[:20], color='blue', linewidth=3)

plt.show()

Output:

Linear Regression Mean squared error: 42863880415.46
Linear Regression r2 score: 0.69
Accuracy score 0.7044808067489784
RandomForestRegressor Mean squared error: 17501164817.95
RandomForestRegressor r2 score: 0.87
RandomForestRegressor Accuracy score 0.9361980772317255
GradientBoostingRegressor Mean squared error: 15377672448.99
GradientBoostingRegressor r2 score: 0.89
GradientBoostingRegressor Accuracy score 0.9990719047561639

We can’t sure which algorithm, will produce the best score for our data set, we have to do trail and error method.

I tried with different algorithms and finetune the parameters of algorithms to get the best results.

Best of luck.

Linear regression example on housing data

admin

Why is machine learning an in-demand skill?

sklearn.preprocessing.StandardScaler() function with example in python

Top 50+ Machine learning interview questions and answers

Leave a Reply Cancel reply

Best Courses

Best Nanodegree Reviews

Latest Courses

Trending Courses

Linear regression example on housing data

Related posts:

admin

Why is machine learning an in-demand skill?

sklearn.preprocessing.StandardScaler() function with example in python

Top 50+ Machine learning interview questions and answers

Leave a Reply Cancel reply