LightGBM actual combat + random search parameter adjustment: accuracy rate 96.67%-AI-php.cn

Home

Technology peripherals

LightGBM actual combat + random search parameter adjustment: accuracy rate 96.67%

PHPz

Jun 08, 2024 pm 10:45 PM

searchrandomLightGBM

LightGBM actual combat + random search parameter adjustment: accuracy rate 96.67%

Hello everyone, I am Peter~

LightGBM is a classic machine learning algorithm. Its background, principles and characteristics are very worthy of study. LightGBM's algorithm yields features such as high efficiency, scalability, and high accuracy. This article will briefly introduce the characteristics and principles of LightGBM as well as some cases based on LightGBM and random search optimization.

LightGBM algorithm

In the field of machine learning, Gradient Boosting Machines (GBMs) are a class of powerful ensemble learning algorithms that gradually add weak learners (usually decision trees) ) to minimize the prediction error and thereby build a powerful model. GBMs are often used to minimize the prediction error and thus build a powerful model, which can be achieved by minimizing the residual or loss function. This algorithm is widely used and often used to minimize the prediction error of strong models built with weak learners such as decision trees.

In the era of big data, the size of data sets has grown dramatically, and traditional GBMs are difficult to scale effectively due to their high computing and storage costs.

For example, for the horizontal segmentation decision tree growth strategy, although it can generate a balanced tree, it often leads to a decrease in the discrimination ability of the model; while for the leaf-based growth strategy, although it can improve the accuracy, it Easy to overfit.
In addition, most GBM implementations need to traverse the entire data set to calculate gradients in each iteration, which is inefficient when the amount of data is huge. Therefore, an algorithm that can efficiently process large-scale data while maintaining model accuracy is needed.

In order to solve these problems, Microsoft launched LightGBM (Light Gradient Boosting Machine) in 2017, a faster, lower memory consumption, and higher performance gradient boosting framework.

Official learning address: https://lightgbm.readthedocs.io/en/stable/

Principle of LightGBM

1. Decision tree algorithm based on histogram:

Principle: LightGBM uses histogram optimization technology to discretize continuous feature values into specific bins (that is, the buckets of the histogram), reducing the amount of data that needs to be calculated when a node is split.
Advantages: This method can increase calculation speed while reducing memory usage.
Implementation details: For each feature, the algorithm maintains a histogram to record the statistical information of the feature in different buckets. When performing node splitting, the information of these histograms can be directly utilized without traversing all the data.

2. Leaf-wise tree growth strategy with depth restriction:

Principle: Unlike traditional horizontal splitting, the leaf-wise growth strategy starts from Select the node with the largest split profit among all current leaf nodes for splitting.
Advantages: This strategy can make the decision tree focus more on the abnormal parts of the data, which can usually lead to better accuracy.
Disadvantages: It can easily lead to overfitting, especially when there is noise in the data.
Improvement measures: LightGBM prevents overfitting by setting a maximum depth limit.

3. One-sided gradient sampling (GOSS):

Principle: For large gradient samples in the data set, the GOSS algorithm only retains a part of the data (usually the large gradient samples), reducing the amount of calculation while ensuring that too much information is not lost.
Advantages: This method can speed up training without significant loss of accuracy.
Application scenarios: Especially suitable for situations with serious data skew.

4. Mutually exclusive feature bundling (EFB):

Principle: EFB is a technology that reduces the number of features and improves computational efficiency. It combines mutually exclusive features (i.e. features that are never non-zero at the same time) to reduce feature dimensionality.
Advantages: Improved memory usage efficiency and training speed.
Implementation details: Through the mutual exclusivity of features, the algorithm can process more features at the same time, thereby reducing the actual number of features processed.

5. Support parallel and distributed learning:

Principle: LightGBM supports multi-threaded learning and can use multiple CPUs for parallel training.
Advantages: Significantly improves the training speed on multi-core processors.
Scalability: It also supports distributed learning and can use multiple machines to jointly train models.

6. Cache optimization:

Principle: The way of reading data is optimized, and more caches can be used to speed up data exchange.
Advantages: Especially on large data sets, cache optimization can significantly improve performance.

7. Supports multiple loss functions:

Features: In addition to commonly used regression and classification loss functions, LightGBM also supports custom loss functions to meet different needs. Business needs.

8. Regularization and pruning:

Principle: L1 and L2 regularization terms are provided to control model complexity and avoid overfitting.
Implementation: The backward pruning strategy is implemented to further prevent overfitting.

9. Model interpretability:

Features: Because it is a model based on decision trees, LightGBM has good model interpretability and can understand the decision-making logic of the model through feature importance and other methods.

Features of LightGBM

Efficiency

Speed advantage: Through histogram optimization and leaf-wise growth strategy, LightGBM greatly improves accuracy while ensuring accuracy. Improved training speed.
Memory usage: LightGBM requires less memory than other GBM implementations, which allows it to handle larger data sets.

Accuracy

Best-priority growth strategy: The leaf-wise growth strategy adopted by LightGBM can fit the data more closely and can usually obtain better results than horizontal segmentation. Good accuracy.
Methods to avoid overfitting: By setting a maximum depth limit and backward pruning, LightGBM can avoid overfitting while improving model accuracy.

Scalability

Parallel and distributed learning: LightGBM is designed to support multi-threading and distributed computing, which allows it to fully utilize the computing power of modern hardware.
Multi-platform support: LightGBM can run on multiple operating systems such as Windows, macOS, and Linux, and supports multiple programming languages such as Python, R, and Java.

Ease of use

Parameter tuning: LightGBM provides a wealth of parameter options to facilitate users to adjust according to specific problems.
Pre-trained model: Users can start from a pre-trained model to speed up their modeling process.
Model interpretation tools: LightGBM provides feature importance evaluation tools to help users understand the decision-making process of the model.

Import library

In [1]:

import numpy as npimport lightgbm as lgbfrom sklearn.model_selection import train_test_split, RandomizedSearchCVfrom sklearn.datasets import load_irisfrom sklearn.metrics import accuracy_scoreimport warningswarnings.filterwarnings("ignore")

Load data

Load the public iris data set:

In [2]:

# 加载数据集data = load_iris()X, y = data.data, data.targety = [int(i) for i in y]# 将标签转换为整数

In [3]:

X[:3]

Out[3]:

array([[5.1, 3.5, 1.4, 0.2], [4.9, 3. , 1.4, 0.2], [4.7, 3.2, 1.3, 0.2]])

In [4]:

y[:10]

Out[4]:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Divided data

In [5]:

# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Create LightGBM dataset at the same time:

In [6]:

lgb_train = lgb.Dataset(X_train, label=y_train)

Parameter settings

In [7]:

# 设置参数范围param_dist = {'boosting_type': ['gbdt', 'dart'],# 提升类型梯度提升决策树（gbdt）和Dropouts meet Multiple Additive Regression Trees（dart）'objective': ['binary', 'multiclass'],# 目标；二分类和多分类'num_leaves': range(20, 150),# 叶子节点数量'learning_rate': [0.01, 0.05, 0.1],# 学习率'feature_fraction': [0.6, 0.8, 1.0],# 特征采样比例'bagging_fraction': [0.6, 0.8, 1.0],# 数据采样比例'bagging_freq': range(0, 80),# 数据采样频率'verbose': [-1]# 是否显示训练过程中的详细信息，-1表示不显示}

Random search parameter adjustment

In [8]:

# 初始化模型model = lgb.LGBMClassifier()# 使用随机搜索进行参数调优random_search = RandomizedSearchCV(estimator=model, param_distributinotallow=param_dist, # 参数组合 n_iter=100,  cv=5, # 5折交叉验证 verbose=2,  random_state=42,  n_jobs=-1)# 模型训练random_search.fit(X_train, y_train)Fitting 5 folds for each of 100 candidates, totalling 500 fits

Output the best parameter combination:

In [9]:

# 输出最佳参数print("Best parameters found: ", random_search.best_params_)Best parameters found:{'verbose': -1, 'objective': 'multiclass', 'num_leaves': 87, 'learning_rate': 0.05, 'feature_fraction': 0.6, 'boosting_type': 'gbdt', 'bagging_freq': 22, 'bagging_fraction': 0.6}

Use the best parameter modeling

In [10]:

# 使用最佳参数训练模型best_model = random_search.best_estimator_best_model.fit(X_train, y_train)# 预测y_pred = best_model.predict(X_test)y_pred = [round(i) for i in y_pred]# 将概率转换为类别# 评估模型print('Accuracy: %.4f' % accuracy_score(y_test, y_pred))Accuracy: 0.9667

The above is the detailed content of LightGBM actual combat + random search parameter adjustment: accuracy rate 96.67%. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

undress free porn AI tool websiteMay 13, 2025 am 11:26 AM

https://undressaitool.ai/ is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

How to create pornographic images/videos using undressAIMay 13, 2025 am 11:26 AM

Tutorial on using undressAI to create pornographic pictures/videos: 1. Open the corresponding tool web link; 2. Click the tool button; 3. Upload the required content for production according to the page prompts; 4. Save and enjoy the results.

undress AI official website entrance website addressMay 13, 2025 am 11:26 AM

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

How does undressAI generate pornographic images/videos?May 13, 2025 am 11:26 AM

undressAI porn AI official website addressMay 13, 2025 am 11:26 AM

The official address of undress AI is:https://undressaitool.ai/;undressAI is Powerful mobile app with advanced AI features for adult content. Create AI-generated pornographic images or videos now!

UndressAI usage tutorial guide articleMay 13, 2025 am 10:43 AM

[Ghibli-style images with AI] Introducing how to create free images with ChatGPT and copyrightMay 13, 2025 am 01:57 AM

The latest model GPT-4o released by OpenAI not only can generate text, but also has image generation functions, which has attracted widespread attention. The most eye-catching feature is the generation of "Ghibli-style illustrations". Simply upload the photo to ChatGPT and give simple instructions to generate a dreamy image like a work in Studio Ghibli. This article will explain in detail the actual operation process, the effect experience, as well as the errors and copyright issues that need to be paid attention to. For details of the latest model "o3" released by OpenAI, please click here⬇️ Detailed explanation of OpenAI o3 (ChatGPT o3): Features, pricing system and o4-mini introduction Please click here for the English version of Ghibli-style article⬇️ Create Ji with ChatGPT

Explaining examples of use and implementation of ChatGPT in local governments! Also introduces banned local governmentsMay 13, 2025 am 01:53 AM

As a new communication method, the use and introduction of ChatGPT in local governments is attracting attention. While this trend is progressing in a wide range of areas, some local governments have declined to use ChatGPT. In this article, we will introduce examples of ChatGPT implementation in local governments. We will explore how we are achieving quality and efficiency improvements in local government services through a variety of reform examples, including supporting document creation and dialogue with citizens. Not only local government officials who aim to reduce staff workload and improve convenience for citizens, but also all interested in advanced use cases.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055612 fails to install in Windows 10?

4 weeks agoByDDD

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks agoByDDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Nordhold: Fusion System, Explained

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.