房屋價格預測

Patricia Arquette原創: 2024-11-03 12:28:29256瀏覽

在房地產領域，確定房地產價格涉及許多因素，從位置和規模到便利設施和市場趨勢。簡單線性迴歸是機器學習的基礎技術，它提供了一種根據房間數量或平方英尺等關鍵特徵來預測房價的實用方法。

在本文中，我深入研究了將簡單線性回歸應用於住房資料集的過程，從資料預處理和特徵選擇到建立可以提供有價值的價格洞察的模型。無論您是資料科學新手還是尋求加深理解，該專案都可以讓您親身探索資料驅動的預測如何塑造更明智的房地產決策。

首先，您首先要匯入庫：

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

#Read from the directory where you stored the data

data  = pd.read_csv('/kaggle/input/california-housing-prices/housing.csv')

data

房屋價格預測

#Test to see if there arent any null values
data.info()

房屋價格預測

#Trying to draw the same number of null values
data.dropna(inplace = True)

data.info()

房屋價格預測

#From our data, we are going to train and test our data

from sklearn.model_selection import train_test_split

X = data.drop(['median_house_value'], axis = 1)
y = data['median_house_value']

房屋價格預測

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

#Examining correlation between x and y training data
train_data = X_train.join(y_train)

train_data

房屋價格預測

#Visualizing the above
train_data.hist(figsize=(15, 8))

房屋價格預測

#Encoding non-numeric columns to see if they are useful and categorical for analysis

train_data_encoded = pd.get_dummies(train_data, drop_first=True)
correlation_matrix = train_data_encoded.corr()
print(correlation_matrix)

房屋價格預測

train_data_encoded.corr()

房屋價格預測

plt.figure(figsize=(15,8))
sns.heatmap(train_data_encoded.corr(), annot=True, cmap = "inferno")

房屋價格預測

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

#Read from the directory where you stored the data

data  = pd.read_csv('/kaggle/input/california-housing-prices/housing.csv')

房屋價格預測

data

ocean_proximity
內陸 5183
近海 2108
近灣 1783
第五島
名稱：計數，資料型態：int64

#Test to see if there arent any null values
data.info()

房屋價格預測

#Trying to draw the same number of null values
data.dropna(inplace = True)

data.info()

房屋價格預測

#From our data, we are going to train and test our data

from sklearn.model_selection import train_test_split

X = data.drop(['median_house_value'], axis = 1)
y = data['median_house_value']

房屋價格預測

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

#Examining correlation between x and y training data
train_data = X_train.join(y_train)

房屋價格預測

train_data

房屋價格預測

#Visualizing the above
train_data.hist(figsize=(15, 8))

#Encoding non-numeric columns to see if they are useful and categorical for analysis

train_data_encoded = pd.get_dummies(train_data, drop_first=True)
correlation_matrix = train_data_encoded.corr()
print(correlation_matrix)

train_data_encoded.corr()

plt.figure(figsize=(15,8))
sns.heatmap(train_data_encoded.corr(), annot=True, cmap = "inferno")

train_data['total_rooms'] = np.log(train_data['total_rooms'] + 1)
train_data['total_bedrooms'] = np.log(train_data['total_bedrooms'] +1)
train_data['population'] = np.log(train_data['population'] + 1)
train_data['households'] = np.log(train_data['households'] + 1)

train_data.hist(figsize=(15, 8))

0.5092972905670141

#convert ocean_proximity factors into binary's using one_hot_encoding
train_data.ocean_proximity.value_counts()

房屋價格預測

#For each feature of the above we will then create its binary(0 or 1)
pd.get_dummies(train_data.ocean_proximity)

0.4447616558596853

#Dropping afterwards the proximity
train_data = train_data.join(pd.get_dummies(train_data.ocean_proximity)).drop(['ocean_proximity'], axis=1)

房屋價格預測

train_data

房屋價格預測

#recheck for correlation
plt.figure(figsize=(18, 8))
sns.heatmap(train_data.corr(), annot=True, cmap ='twilight')

0.5384474921332503

我真的想說，訓練機器並不是最簡單的過程，但為了不斷改進上面的結果，您可以在param_grid 下添加更多功能，例如min_feature，這樣您的最佳估計器分數就可以不斷改進。

如果您到目前為止，請在下面按讚並分享您的評論，您的意見非常重要。謝謝！ ??❤️

以上是房屋價格預測的詳細內容。更多資訊請關注PHP中文網其他相關文章！

if count Property number this location

陳述：

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

上一篇：多邊形中的點：光線追蹤與 Matplotlib - 哪種方法獲勝？下一篇：多邊形中的點：光線追蹤與 Matplotlib - 哪種方法獲勝？

看更多