Home >Technology peripherals >AI >Tsinghua Zhu Wenwu's team: AutoGL-light, the world's first lightweight automatic machine learning library for graphs in open source
Since the release of AutoGL in 2020, the team of Professor Zhu Wenwu of Tsinghua University has made new progress in the interpretability and generalizability of automatic graph machine learning, with special focus on graph Transformer, graph In terms of out-of-distribution generalization (OOD), graph self-supervised learning, etc., the graph neural architecture search and evaluation benchmark was published, and the first lightweight intelligence library (AutoGL-light) was released on GitLink, China's new generation open source innovation service platform.
Graph is a general abstraction that describes the relationship between data. It widely exists in different research fields and has Many important applications, such as social network analysis, recommendation systems, traffic prediction and other Internet applications, new drug discovery, new material preparation and other scientific applications (AI for Science), cover many different fields. Graph machine learning has gained widespread attention in recent years. Since different graph data vary widely in structure, nature, and tasks, existing manually designed graph machine learning models lack the ability to generalize to different scenarios and environmental changes. AutoML on Graphs is the forefront of graph machine learning development. It aims to automatically design the optimal graph machine learning model for given data and tasks. It is of great value in both research and application.
In response to the problem of automatic machine learning on graphs, Professor Zhu Wenwu of Tsinghua University’s team began planning in 2017 and released the AutoGL in 2020 - the world’s first automatic machine learning for graphs Machine learning platforms and toolkits.
Project address: https://github.com/THUMNLab/AutoGL
The smart library has been It has received over a thousand stars on GitHub, attracted tens of thousands of visits from more than 20 countries and regions, and was published on GitLink. The smart library includes a complete set of graph automatic machine learning processes, covering mainstream graph automatic machine learning methods. Through the graph automatic machine learning solution AutoGL Solver, Zhitu splits the automatic machine learning on the graph into five core parts: graph automatic feature engineering, graph neural architecture search (NAS), graph hyperparameter optimization (HPO), graph model training, and automatic integration of graph models. The smart library already supports various types of graph tasks such as node classification, heterogeneous graph node classification, link prediction, and graph classification.
In view of the current lack of interpretability and generalizability of graph automatic machine learning, Intelligent Intelligence The graph team has made a series of new progress in graph automatic machine learning research.
1. Graph out-of-distribution generalization (OOD) architecture search
Neural architecture search for graphs cannot process graphs To solve the problem of data distribution changes, a graph neural architecture search method based on decoupled self-supervised learning is proposed. By customizing an appropriate graph neural network architecture for each graph sample, the adaptability of the graph neural architecture search method to handle data distribution shifts is effectively enhanced. . This work has been published at ICML 2022, a top international conference on machine learning.
Paper address: https://proceedings.mlr.press/v162/qin22b/qin22b.pdf
2. Large-scale graph architecture search
To solve the problem that existing graph neural architecture search cannot handle large-scale graphs, an architecture-subgraph union is proposed. The super-network training method of the sampling mechanism breaks through the consistency bottleneck in the sampling process through importance sampling and peer learning algorithms, greatly improves the efficiency of graph neural architecture search, and achieves for the first time a single machine that can process 100 million Scale real graph data. This work has been published at ICML 2022, a top international conference on machine learning.
Paper address: https://proceedings.mlr.press/v162/guan22d.html
3. Graph neural architecture search evaluation benchmark
In view of the lack of unified evaluation standards for graph neural architecture search and the huge amount of computing resources consumed in the evaluation process, the Zhitu team researched and proposed the graph neural architecture search benchmark NAS-Bench-Graph, which is the first graph neural architecture search benchmark. A tabular benchmark for neural architecture search. This benchmark can efficiently, fairly, and reproducibly compare different graph neural architecture search methods, filling the gap where there is no benchmark for graph data architecture search. NAS-Bench-Graph designed a search space containing 26,206 different graph neural network architectures, using 9 commonly used node classification graph data of different sizes and types, and provided fully trained model effects, which can be used in While ensuring reproducibility and fair comparison, computing resources are greatly reduced. This work has been published at NeurIPS 2022, a top international conference on machine learning.
Project address: https://github.com/THUMNLab/NAS-Bench-Graph
4. Automatic Graph Transformer
In view of the problem that the current manually designed graph Transformer architecture is difficult to achieve the best prediction performance, an automatic graph Transformer architecture search framework is proposed. The unified graph Transformer search space and structure-aware performance evaluation strategy solves the problem that designing the best graph Transformer is time-consuming and difficult to obtain the optimal architecture. This work was published in ICLR 2023, the top international conference on machine learning.
Paper address: https://openreview.net/pdf?id=GcM7qfl5zY
5. Robust graph neural architecture search
Aiming at the problem that current graph neural architecture search cannot handle adversarial attacks, a robust graph neural architecture search method is proposed. By searching Robust graph operators are added to the space and robustness evaluation indicators are proposed during the search process, which enhances the ability of graph neural architecture search to withstand adversarial attacks. This work has been published at CVPR 2023, a top international conference on pattern recognition.
Paper address: https://openaccess.thecvf.com/content/CVPR2023/papers/Xie_Adversarially_Robust_Neural_Architecture_Search_for_Graph_Neural_Networks_CVPR_2023_paper.pdf
6. Self-supervised graph neural architecture search
Existing graph neural architecture search heavily relies on labels as indicators for training and searching architectures, limitations The application of automatic machine learning on graphs in label-deficient scenarios. In response to this problem, the Zhitu team proposed a self-supervised graph neural architecture search method, discovered the potential relationship between the graph factors that drive graph data formation and the optimal neural architecture, and adopted a novel decoupled self-supervised graph neural architecture. The search model realizes effective search for optimal architecture on unlabeled graph data. This work has been accepted into NeurIPS 2023, a top conference on machine learning.
7. Multi-task graph neural architecture search
Targeting existing Graph neural network architecture search cannot take into account the differences in architectural requirements for different tasks. The Zhitu team proposed the first multi-task graph neural network architecture search method. It designs optimal architectures for different graph tasks at the same time and uses course learning to capture the differences between different tasks. The collaborative relationship between them effectively realizes the optimal architecture for customizing different graph tasks. This work has been accepted into NeurIPS 2023, a top conference on machine learning.
Based on the above research progress, the Intelligent Map team designated an open source platform at CCF GitLink released AutoGL-light, the world's first lightweight graph automatic machine learning open source library. Its overall architecture diagram is shown in Figure 1. The lightweight smart graph mainly has the following characteristics:
Figure 1. Lightweight smart graph framework diagram
Project address: https://gitlink.org.cn/THUMNLab/AutoGL-light
1. Module decoupling
Lightweight Intelligent Graph achieves more convenient support for automatic machine learning pipelines of different graphs through a more comprehensive module decoupling method, allowing modules to be freely added in any step of the machine learning process to meet the needs of User customized needs.
2. Self-customization capability
Lightweight intelligence library supports user-customized graph hyperparameter optimization (HPO ) and graph neural architecture search (NAS). In the graph hyperparameter optimization module, Lightweight Intelligent Graph provides a variety of hyperparameter optimization algorithms and search spaces, and supports users to create their own search spaces by inheriting base classes. In the graph neural architecture search module, the lightweight smart graph implements typical and most advanced search algorithms, and users can easily combine and customize the module design of search spaces, search strategies, and evaluation strategies according to their own needs.
3. Wide range of application fields
The application of lightweight intelligent graphs is not limited to traditional graph machines learning tasks, but further expanded to a wider range of application areas. Currently, the lightweight smart map already supports AI for Science applications such as molecular maps and single-cell omics data. In the future, Lightweight Intelligent Graph hopes to provide the most advanced graph automatic machine learning solutions for graph data in different fields.
4. GitLink Programming Summer Camp
Taking the opportunity of Lightweight Smart Map, the Smart Map team is deeply involved in GitLink Programming Summer Camp (GLCC) is a summer programming activity for college students across the country organized by the CCF Open Source Development Committee (CCF ODC) under the guidance of the CCF China Computer Federation. The two projects of the Zhitu team, "GraphNAS Algorithm Reproduction" and "Application Cases in the Field of Graph Automatic Learning Science", attracted undergraduate and graduate students from more than ten domestic universities to sign up.
During the summer camp, the Zhitu team actively communicated with participating students, and the work progress exceeded expectations. Among them, the GraphNAS algorithm replication project successfully implemented the above-mentioned generalized architecture search outside the graph distribution (ICML'22), large-scale graph architecture search (ICML'22), and automatic graph Transformer (ICLR'23) in lightweight intelligent graphs. ), effectively verifying the flexibility and independent customization capabilities of the lightweight think library.
The Graph Automatic Machine Learning Science Application Project implements graph-based biological information processing algorithms on lightweight intelligent graphs, including the representative algorithms scGNN for single-cell RNA sequencing analysis, MolCLR, a representative algorithm for molecular representation learning, and AutoGNNUQ, a representative algorithm for molecular structure prediction, promote the application of graph automatic machine learning technology in AI for Science. In the GitLink Programming Summer Camp, Lightweight Intelligent Graph not only enriches algorithms and application cases, but also allows participating students to practice open source software development and other skills, cultivate talents in graph automatic machine learning, and contribute to the development of my country's open source ecological construction. own strength.
The Zhitu team comes from the Network and Media Laboratory led by Professor Zhu Wenwu of the Department of Computer Science at Tsinghua University. The core members include assistant professor Wang Xin, postdoctoral fellow Zhang Ziwei, doctoral students Li Haoyang, Qin Yijian, Zhang Zeyang, master student Guan Chaoyu and more than ten people. The project has received strong support from the National Natural Science Foundation of China and the Ministry of Science and Technology.
The above is the detailed content of Tsinghua Zhu Wenwu's team: AutoGL-light, the world's first lightweight automatic machine learning library for graphs in open source. For more information, please follow other related articles on the PHP Chinese website!