Home  >  Article  >  Technology peripherals  >  Mango: a new method of Bayesian optimization based on Python environment

Mango: a new method of Bayesian optimization based on Python environment

WBOY
WBOYforward
2023-04-08 12:44:211343browse

Translator| Zhu Xianzhong

Reviewer| Sun Shujuan

Introduction

The optimization of model hyperparameters (or model settings) may be the most important in training machine learning algorithms step because it can find the best parameters that minimize the model's loss function. This step is also essential to build a generalization model that is not prone to overfitting.

The most well-known techniques for optimizing model hyperparameters are exhaustive grid search and stochastic grid search. In the first approach, the search space is defined as a grid spanning the domain of each model hyperparameter. The optimal hyperparameters are obtained by training the model at each point of the grid. Although grid search is very easy to implement, it becomes computationally expensive, especially when the number of variables to be optimized is large. On the other hand, stochastic grid search is a faster optimization method that provides better results. In stochastic grid search, optimal hyperparameters are obtained by training the model only on a sample of random points in the grid space.

Mango: a new method of Bayesian optimization based on Python environment

The above figure gives a comparison between the two grid search types. Among them, the nine points represent the choice of parameters, and the curves on the left and top represent the model accuracy as a function of each search dimension. This data is taken from a paper published by Salgado Pilario et al. in "IEEE Transactions on Industrial Electronics" (68, 6171–6180, 2021).

Both grid search algorithms have long been widely used by data scientists to find optimal model hyperparameters. However, these methods often find model hyperparameters where the loss function is far from the global minimum.

However, by 2013, this history changed. This year, James Bergstra and his collaborators published a paper in which they explored Bayesian optimization techniques in order to find optimal hyperparameters for neural networks for image classification. They compared the results with those of a stochastic grid search. Compare. The final conclusion is that Bayesian method is better than random grid search, please refer to the figure below.

Mango: a new method of Bayesian optimization based on Python environment

The figure shows the validation errors on the LFW data set (left) and the PubFig83 data set (right). Among them, TPE, namely "Tree Parzen Estimator", is an algorithm used in Bayesian optimization. This figure is taken from the paper by Bergstra et al. published in the Journal of Machine Learning Research (28, 115–123, 2013).

But why is Bayesian optimization better than any grid search algorithm? Because this is a bootstrapping method, it performs an intelligent search for model hyperparameters rather than finding them through trial and error.

In this article, we will dissect the above Bayesian optimization method in detail, and will explore an implementation version of this algorithm through a relatively new Python package called Mango.

Bayesian Optimization

Before explaining what Mango can do, we need to first understand how Bayesian optimization works. Of course, if you already understand the algorithm very well, you can skip reading this section.

To summarize, there are 4 parts to Bayesian optimization:

  • Objective function: This is the real function you want to minimize or maximize. For example, it can be the root mean square error (RMSE) in regression problems or the logarithmic loss function in classification problems. In the optimization of machine learning models, the objective function depends on the model hyperparameters. This is why the objective function is also called a black box function because its shape is unknown.
  • Search domain or search space: This corresponds to the possible values ​​that each model hyperparameter has. As a user, you need to specify the search space of your model. For example, the search domain for a random forest regression model might be:
param_space = {'max_depth': range(3, 10),
'min_samples_split': range(20, 2000),
'min_samples_leaf': range(2, 20),
'max_features': ["sqrt", "log2", "auto"],
'n_estimators': range(100, 500)
}

Bayesian optimization uses a defined search space to sample the points evaluated in the objective function.

  • Agent model: Evaluating the objective function is very expensive, so in practice we only know the true value of the objective function in a few places. However, we need to know the value elsewhere. This is where surrogate models come in, which are tools for modeling objective functions. A common choice of surrogate model is the so-called Gaussian Processes (GP: Gaussian Processes) due to its ability to provide uncertainty estimates.

At the beginning of Bayesian optimization, the surrogate model starts with a prior function that distributes uniform uncertainty along the search space:

Mango: a new method of Bayesian optimization based on Python environment

The figure shows the values ​​of the prior function of the agent model. Among them, the shaded area represents the uncertainty, while the black line represents its average value and the purple line represents the one-dimensional objective function. This image is excerpted from a 2020 blog post exploring Bayesian optimization by Aporv Agnihotri and Nipun Batra.

Every time a sample point in the search space is evaluated in the objective function, the uncertainty of the surrogate model at that point becomes zero. After many iterations, the surrogate model will resemble the objective function:

Mango: a new method of Bayesian optimization based on Python environment

Surrogate model for a simple one-dimensional objective function

However, Bayesian optimization The goal is not to model the objective function but to find the optimal model hyperparameters in as few iterations as possible. To do this, an acquisition function is used.

The steps of Bayesian optimization code are:

Select the surrogate model used to model the objective function and define its prior for i = 1, 2, ..., Number of iterations:

The following figure shows the Bayesian optimization of a simple one-dimensional function:

Mango: a new method of Bayesian optimization based on Python environment

The above figure shows the one-dimensional function Bayesian optimization. The picture is excerpted from ARM research’s blog article​​"Scalable Hyperparameter Adjustment of AutoML"​​.

In fact, there are many Python software packages that use Bayesian optimization behind the scenes to obtain the best hyperparameters for machine learning models. For example: Hyperopt; Optuna; Bayesian optimization; Scikit-optimize (skopt); GPyOpt; pyGPGO and Mango, etc. Here are just some of them.

Now, let us officially start the discussion of Mango.

Mango: Why is it so special?

In recent years, the amount of data in various industries has increased significantly. This is a challenge for data scientists, which requires their machine learning pipelines to be scalable. Distributed computing may solve this problem.

Distributed computing refers to a group of computers that perform common tasks while communicating with each other; this is different from parallel computing. In parallel computing, a task is divided into subtasks, which are assigned to different processors on the same computer system.

Mango: a new method of Bayesian optimization based on Python environment

Schematic diagram of parallel computing and distributed computing architecture.

Although there are quite a few Python libraries that use Bayesian optimization to optimize model hyperparameters, none of them support scheduling on any distributed computing framework. One of the motivations of Mango's developers was to create an optimization algorithm that could work in a distributed computing environment while maintaining the power of Bayesian optimization.

What is the secret of Mango architecture? Make it work well in a distributed computing environment? Mango is built using a modular design, in which the optimizer and scheduler are decoupled. This design allows for easy scaling of machine learning pipelines that work with large amounts of data. However, this architecture faces challenges in optimization methods because traditional Bayesian optimization algorithms are continuous; this means that the acquisition function only provides a single next point to evaluate the search.

Mango uses two methods to parallelize Bayesian optimization: one is a method called bandits of batch Gaussian processes, and the other is k-means clustering. In this blog, we will not explain batch Gaussian processes.

Regarding clustering methods, a group of researchers at IBM proposed using k-means clustering to scale out the Bayesian optimization process in 2018 (for technical details, see the paper https://arxiv. org/pdf/1806.01159.pdf). The method consists of sampling cluster points from the search domain that generate high values ​​in the acquisition function (see figure below). At the beginning, these clusters are far away from each other in the parameter search space. When the optimal region in the surrogate function is found, the distance in the parameter space decreases. The k-means clustering method scales optimization horizontally because each cluster is used to run Bayesian optimization as a separate process. This parallelization leads to faster finding of optimal model hyperparameters.

Mango: a new method of Bayesian optimization based on Python environment

Mango使用聚类方法来扩展贝叶斯优化方法。采集函数上的彩色区域是由搜索空间中具有高采集函数值的采样点构建的聚类。开始时,聚类彼此分离,但由于代理函数与目标相似,它们的距离缩短。(图片摘自ARM research的博客文章《AutoML的可伸缩超参数调整》)

除了能够处理分布式计算框架之外,Mango还与Scikit-learn API兼容。这意味着,您可以将超参数搜索空间定义为Python字典,其中的键是模型的参数名,每个项都可以用scipy.stats中实现的70多个分布中的任何一个来定义。所有这些独特的特性使Mango成为希望大规模利用数据驱动解决方案的数据科学家的好选择。

简单示例

接下来,让我们通过一个实例展示Mango是如何在优化问题中工作的。首先,您需要创建一个Python环境,然后通过以下命令安装Mango:

pip install arm-mango

在本例中,我们使用可直接从Scikit-learn加载的加州住房数据集(有关此链接的更多信息请参考https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html):

import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
import time
from mango import Tuner

housing = fetch_california_housing()

# 从输入数据创建数据帧
# 注:目标的每个值对应于以100000为单位的平均房屋价值
features = pd.DataFrame(housing.data, columns=housing.feature_names)
target = pd.Series(housing.target, name=housing.target_names[0])

该数据集共包含20640个样本。每个样本包括房屋年龄、平均卧室数量等8个特征。此外,加州住房数据集还包括每个样本的房价,单位为100000。房价分布如下图所示:

Mango: a new method of Bayesian optimization based on Python environment

在图示的左面板中,显示了加利福尼亚数据集中房价的空间分布。右边给出的是相应于同一变量的直方图。

请注意,房价的分布有点偏左。这意味着,在目标中需要一些预处理。例如,我们可以通过Log或Box-Cox变换将目标的分布转换为正态形状。由于目标方差的减小,这种预处理可以提高模型的预测性能。我们将在超参数优化和建模期间执行此步骤。现在,让我们将数据集拆分成训练集、验证集和测试集三部分:

# 将数据集拆分成训练集、验证集和测试集
x_train, x_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)
x_train, x_validation, y_train, y_validation = train_test_split(x_train, y_train, test_size=0.2, random_state=42)

到目前,我们已经准备好使用Mango来优化机器学习模型。首先,我们定义Mango从中获取值的搜索空间。在本例中,我们使用了一种称为“极端随机树(Extreme Randomized Trees)”的算法,这是一种与随机森林非常相似的集成方法,不同之处在于选择最佳分割的方式是随机的。该算法通常以偏差略微增加为代价来减少方差。

极端随机化树的搜索空间可以按如下方式定义:

# 第一步:定义算法的搜索空间(使用range而不是uniform函数来确保生成整数)
param_space = {'max_depth': range(3, 10),
'min_samples_split': range(int(0.01*features.shape[0]), int(0.1*features.shape[0])),
'min_samples_leaf': range(int(0.001*features.shape[0]), int(0.01*features.shape[0])),
'max_features': ["sqrt", "log2", "auto"]
}

定义参数空间后,我们再指定目标函数。在这里,我们使用上面创建的训练和验证数据集;但是,如果您想运行k倍交叉验证策略,则需要在目标函数中由您自己来实现它。

# 第二步:定义目标函数
# 如果要进行交叉验证,则在目标中定义交叉验证
#在这种情况下,我们使用类似于1倍交叉验证的方法。
def objective(list_parameters):
global x_train, y_train, x_validation, y_validation

results = []
for hyper_params in list_parameters:
model = ExtraTreesRegressor(**hyper_params)
model.fit(x_train, np.log1p(y_train))
prediction = model.predict(x_validation)
prediction = np.exp(prediction) - 1# to get the real value not in log scale
error = np.sqrt(mean_squared_error(y_validation, prediction))
results.append(error)
return results

关于上述代码,有几点需要注意:

  • 目标函数旨在找到使均方根误差(RMSE)最小化的最佳模型参数。
  • 在Scikit-learn中,回归问题的极端随机化树的实现称为ExtraTreesRegressor。
  • 请注意,训练集中的房价要经过对数变换。因此,验证集上的预测被转换回其原始规模。

优化模型超参数所需的最后一步是实例化类Tuner,它负责运行Mango:

#第三步:通过Tuner运行优化
start_time = time.time()
tuner = Tuner(param_space, objective, dict(num_iteration=40, initial_random=10))#初始化Tuner
optimisation_results = tuner.minimize()
print(f'The optimisation in series takes {(time.time()-start_time)/60.} minutes.')

#检查结果
print('best parameters:', optimisation_results['best_params'])
print('best accuracy (RMSE):', optimisation_results['best_objective'])

# 使用测试集上的最佳超参数运行模型
best_model = ExtraTreesRegressor(n_jobs=-1, **optimisation_results['best_params'])
best_model.fit(x_train, np.log1p(y_train))
y_pred = np.exp(best_model.predict(x_test)) - 1# 获取实际值
print('rmse on test:', np.sqrt(mean_squared_error(y_test, y_pred)))

上述代码在MacBook Pro(处理器为2.3 Ghz四核英特尔酷睿i7)上运行了4.2分钟。

最佳超参数和最佳RMSE分别为:

best parameters: {‘max_depth’: 9, ‘max_features’: ‘auto’, ‘min_samples_leaf’: 85, ‘min_samples_split’: 729}
best accuracy (RMSE): 0.7418871882901833

当在具有最佳模型参数的训练集上训练模型时,测试集上的RMSE为:

rmse on test: 0.7395178741584788

免责声明:运行此代码时可能会得到不同的结果。

让我们简要回顾一下上面代码中使用的类Tuner。此类有许多配置参数,但在本例中,我们只尝试了其中两个:

  • num_iteration:这些是Mango用于找到最佳值的迭代总数。
  • initial_random:该变量设置测试的随机样本数。注意:Mango将所有随机样本一起返回。这非常有用,尤其是在优化需要并行运行的情况下。

注意:本博客中发布的示例仅使用了一个小数据集。然而,在许多实际应用程序中,您可能会处理需要并行实现Mango的大型数据文件。如果您转到我的​​GitHub源码仓库​​,您可以找到此处显示的完整代码以及大型数据文件的实现。

In short, Mango is versatile. You can use it in a wide range of machine and deep learning models that require parallel implementation or distributed computing environments to optimize their hyperparameters. Therefore, I encourage you to visit Mango’s GitHub repository. There, you can find many project source codes showing the use of Mango in different computing environments. Summary

In this blog, we introduced Mango: a Python library for large-scale Bayesian optimization. This package will enable you to:

Extend the optimization of model hyperparameters and even run on distributed computing frameworks.
  • Easily integrate scikit-learn models with Mango to generate powerful machine learning pipelines.
  • Use any probability distribution function implemented in scipy.stats for declaring your search space.
  • All these features make Mango a unique Python library for extending your data science toolkit.

Translator Introduction

Zhu Xianzhong, 51CTO community editor, 51CTO expert blogger, lecturer, computer teacher at a university in Weifang, and a veteran in the freelance programming industry.

Original title:

Mango: A new way to do Bayesian optimization in Python, author: Carmen Adriana Martinez Barbosa

The above is the detailed content of Mango: a new method of Bayesian optimization based on Python environment. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete