首頁 >後端開發 >Python教學 >交叉驗證以及python程式碼實現

交叉驗證以及python程式碼實現

零到壹度原創: 2018-04-16 11:45:3310443瀏覽

這篇文章介紹的內容是關交叉驗證以及python程式碼實現，有著一定的參考價值，現在分享給大家，有需要的朋友可以參考一下

##模型選擇的兩種方法：正規化（典型方法）、交叉驗證。

這裡介紹交叉驗證及其python程式碼實作。

交叉驗證

如果給定樣本資料充足，進行模型選擇的簡單方法是隨機地將資料集切分為3部分，分為訓練集、驗證集、測試集。

訓練集：訓練模型

驗證集：模型的選擇

測試集：最終對模型的評估

在學習到不同複雜度的模型中，選擇對驗證集有最小預測誤差的模型。由於驗證集有足夠的數據，因此使用它進行模型選擇也是有效的。在許多實際應用中資料不足的情況下，可以使用交叉驗證方法。

基本想法：重複地使用數據，把給定數據進行切分，分為訓練集和測試集，在此基礎上反覆地進行訓練、測試以及模型選擇。

簡單交叉驗證：

隨機將資料分割為兩部分，訓練集和測試集。一般 70%的資料為訓練集，30%為測試集。

程式碼（分割訓練集，測試集）：

from sklearn.cross_validation import train_test_split
# data (全部数据)   labels(全部目标值)     X_train 训练集(全部特征)  Y_train 训练集的目标值
X_train, X_test, Y_train, Y_test = train_test_split(data,labels, test_size=0.25, random_state=0) #这里训练集75%:测试集25%

其中的 random_state

# 原始碼解釋: int, RandomState instance or None, optional (default=None)

## instance or None, optional (default=None)If int, random_state is the seed used by the random number generator;

If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used
by `np.random`.

##大意是：如果設定一個特定值的話，例如

#random_state=10 ，則每次劃分後的資料都一樣，運行多次也一樣。如果設為None, 即random_state=None，則每次分割後的資料都不同，每一次執行分割的資料都不同。程式碼（分割訓練集，驗證集，測試集）：

from sklearn import cross_validation

train_and_valid, test = cross_validation.train_test_split(data, test_size=0.3,random_state=0)  # 先分为两部分：训练和验证  ，  测试集
train, valid = cross_validation.train_test_split(data, test_size=0.5,random_state=0)   # 再把训练和验证分为：训练集 ，验证集

交叉驗證以及python程式碼實現

相關文章