XGBoost 是一種流行的機器學習演算法,經常在 Kaggle 和其他資料科學競賽中名列前茅。 XGBoost 的與眾不同之處在於它能夠將多個弱模型(在本例中為決策樹)組合成一個強模型。這是透過一種稱為梯度增強的技術來完成的,該技術有助於使演算法穩健且對於各種預測任務都非常有效。
XGBoost 使用梯度提升,這意味著它按順序建立樹,其中每棵樹都嘗試糾正先前樹的錯誤。這是該過程的簡化視圖:
例如,如果我們預測房價:
這個過程與一些巧妙的數學和優化相結合,使得 XGBoost 既準確又快速。
雖然 XGBoost 最初是作為 C 庫實現的,但有適用於 Python 和 R 等語言的綁定,使得通常專門從事資料和機器學習的廣泛開發人員可以使用它。
我最近有一個專案對 Node.js 有嚴格的要求,所以我看到了一個透過為 Node.js 編寫綁定來彌補差距的機會。我希望這有助於為 JavaScript 開發人員打開更多 ML 的大門。
在本文中,我們將仔細研究如何在 Node.js 應用程式中使用 XGBoost。
開始之前,請確保您已經:
使用 npm 安裝 XGBoost Node.js 綁定:
npm install xgboost_node
在進入程式碼之前,讓我們先了解我們的特徵在房價預測範例中代表什麼:
// Each feature array represents: [square_feet, property_age, total_rooms, has_parking, neighborhood_type, is_furnished] // Example: [1200, 8, 10, 0, 1, 1 ]
以下是每個功能的意思:
And the corresponding labels array contains house prices in thousands (e.g., 250 means $250,000).
If you have raw data in a different format, here's how to transform it for XGBoost:
// Let's say you have data in this format: const rawHouses = [ { address: "123 Main St", sqft: 1200, yearBuilt: 2015, rooms: 10, parking: "Yes", neighborhood: "Residential", furnished: true, price: 250000 }, // ... more houses ]; // Transform it to XGBoost format: const features = rawHouses.map(house => [ house.sqft, new Date().getFullYear() - house.yearBuilt, // Convert year built to age house.rooms, house.parking === "Yes" ? 1 : 0, // Convert Yes/No to 1/0 house.neighborhood === "Residential" ? 1 : 2, // Convert category to number house.furnished ? 1 : 0 // Convert boolean to 1/0 ]); const labels = rawHouses.map(house => house.price / 1000); // Convert price to thousands
Here's a complete example that shows how to train a model and make predictions:
import xgboost from 'xgboost_node'; async function test() { const features = [ [1200, 8, 10, 0, 1, 1], [800, 14, 15, 1, 2, 0], [1200, 8, 10, 0, 1, 1], [1200, 8, 10, 0, 1, 1], [1200, 8, 10, 0, 1, 1], [800, 14, 15, 1, 2, 0], [1200, 8, 10, 0, 1, 1], [1200, 8, 10, 0, 1, 1], ]; const labels = [250, 180, 250, 180, 250, 180, 250, 180]; const params = { max_depth: 3, eta: 0.3, objective: 'reg:squarederror', eval_metric: 'rmse', nthread: 4, num_round: 100, min_child_weight: 1, subsample: 0.8, colsample_bytree: 0.8, }; try { await xgboost.train(features, labels, params); const predictions = await xgboost.predict([[1000, 0, 1, 0, 1, 1], [800, 0, 1, 0, 1, 1]]); console.log('Predicted value:', predictions[0]); } catch (error) { console.error('Error:', error); } } test();
The example above shows how to:
XGBoost provides straightforward methods for saving and loading models:
// Save model after training await xgboost.saveModel('model.xgb'); // Load model for predictions await xgboost.loadModel('model.xgb');
You may have noticed there are parameters for this model. I would advise looking into XGBoost documentation to understand how to tune and choose your parameters. Here's what some of these parameters are trying to achieve:
const params = { max_depth: 3, // Controls how deep each tree can grow eta: 0.3, // Learning rate - how much we adjust for each tree objective: 'reg:squarederror', // For regression problems eval_metric: 'rmse', // How we measure prediction errors nthread: 4, // Number of parallel processing threads num_round: 100, // Number of trees to build min_child_weight: 1, // Minimum amount of data in a leaf subsample: 0.8, // Fraction of data to use in each tree colsample_bytree: 0.8, // Fraction of features to consider for each tree };
These parameters significantly impact your model's performance and behavior. For example:
This guide provides a starting point for using XGBoost in Node.js. For production use, I recommend:
Jonathan Farrow
@farrow_jonny
以上是在 Node.js 中使用 XGBoost 預測房價的詳細內容。更多資訊請關注PHP中文網其他相關文章!