Home > Article > Technology peripherals > Specifically designed for decision trees, National University of Singapore & Tsinghua University jointly propose a fast and secure new federated learning system
Federated learning is a very hot field in machine learning, which refers to the joint training of models by multiple parties without transferring data. With the development of federated learning, federated learning systems are emerging one after another, such as FATE, FedML, PaddleFL, TensorFlow-Federated and so on. However, most federated learning systems do not support federated learning training of tree models. Compared with neural networks, tree models have the characteristics of fast training, strong interpretability, and suitable for tabular data. Tree models have a wide range of application scenarios in finance, medical care, the Internet and other fields, such as advertising recommendations, stock predictions, etc.
The representative model of decision tree is Gradient Boosting Decision Tree (GBDT). Since the prediction ability of one tree is limited, GBDT trains multiple trees in series through the boosting method, and finally achieves a good prediction effect by fitting each tree to the residual of the current prediction value and label value. Representative GBDT systems include XGBoost, LightGBM, CatBoost, and ThunderGBM. Among them, XGBoost has been used by the KDD cup championship team many times. However, none of these systems support GBDT training in federated learning scenarios. Recently, researchers from the National University of Singapore and Tsinghua University proposed a new federated learning system FedTree that focuses on training tree models.
FedTree system introductionFedTree architecture diagram is shown in Figure 1. There are 5 modules in total: interface, environment, framework, privacy protection and model.
Figure 1: FedTree system architecture diagram
Interface: FedTree supports two interfaces: command line interface and Python interface. Users only need to provide parameters (number of participants, federation scenario, etc.) and can run FedTree for training with a one-line command. FedTree's Python interface is compatible with scikit-learn, and you can call fit() and predict() for training and prediction.
Environment: FedTree supports simulated deployment of federated learning on a single machine and deployment of distributed federation on multiple machines study. In a stand-alone environment, FedTree supports dividing data into multiple sub-data sets, and each sub-data set is trained as a participant. In a multi-machine environment, FedTree supports each machine as a participant, and machines communicate through gRPC. At the same time, in addition to CPU, FedTree supports the use of GPU to accelerate training.
Framework: FedTree supports the training of GBDT in horizontal and vertical federated learning scenarios. In the horizontal scenario, different participants have different training samples and the same feature space. In the vertical scenario, different participants have different feature spaces and the same training samples. In order to ensure performance, in both scenarios, multiple parties participate in the training of each node. In addition, FedTree also supports ensemble learning, where participants train trees in parallel and then aggregate them to reduce communication overhead between participants.
Privacy: Since the gradient passed during training may leak information about the training data, FedTree provides different Privacy-preserving methods to further protect gradient information include homomorphic encryption (HE) and secure aggregation (SA). At the same time, FedTree provides differential privacy to protect the final trained model.
Model: Based on training a tree, FedTree supports training GBDT through boosting/bagging method /random forest. By setting different loss functions, the model trained by FedTree supports a variety of tasks, including classification and regression.
ExperimentTable 1 summarizes the AUC of different systems on a9a, breast and credit and the RMSE on abalone, the model effect of FedTree and training GBDT (XGBoost, ThunderGBM) with all data and SecureBoost (SBT) in FATE is almost identical. Moreover, the privacy protection policies SA and HE do not affect the model performance.
Table 1: Comparison of model effects of different systems
Table 2 summarizes the training time (unit: seconds) of each tree in different systems. It can be seen that FedTree is much faster than FATE, and can achieve an acceleration ratio of more than 100 times in a horizontal federated learning scenario.
Table 2: Comparison of training time for each tree in different systems
For more research details, please refer to the original FedTree paper.
The above is the detailed content of Specifically designed for decision trees, National University of Singapore & Tsinghua University jointly propose a fast and secure new federated learning system. For more information, please follow other related articles on the PHP Chinese website!