Home >Technology peripherals >AI >Integrated multi-graph neural network

Integrated multi-graph neural network

王林
王林forward
2023-05-10 21:10:041388browse

Integrated multi-graph neural network

1. GNN from a unified perspective

1. Existing GNN propagation paradigm

How does GNN propagate in the airspace? As shown in the figure below, take node A as an example:

Integrated multi-graph neural network

##First it will identify its neighbors The information of node N (A) is aggregated into a hN(A)(1 ), and then combine it with A to represent the upper layer h# #N(A)(1) is combined, and then through a transformation function (i.e. Trans(·) in the formula), the next level representation of A is obtained hN(A)(2). This is the most basic GCN propagation paradigm.

Integrated multi-graph neural network

In addition, there is a decoupled propagation process:

Integrated multi-graph neural network

Integrated multi-graph neural network

#These two What's the difference?

In the decoupled propagation paradigm, a feature extractor, that is, the transformation function, is first used to extract the initial features, and then the extracted features are put into the aggregation function for aggregation. You can see this This method separates feature extraction and aggregation, that is, decoupling is achieved. The advantage of this is:

    #You can freely design the previous transformation function and use any model.
  • Many layers can be added during aggregation to obtain more distant connection information, but we will not face the risk of over-parameterization, because in the aggregation function There are no parameters to optimize.
The above are the two main paradigms, and the embedding output of the node can use the last layer of the network or the residual of the middle layer. .

Through the above review we can see that there are two basic information sources in GNN:

    Topological structure of the network: Generally, the matching information attributes of the graph structure can be captured.
  • # Characteristics of nodes: Generally, they include low-frequency and high-frequency signals of nodes.
  • #2. A unified optimization framework

Based on the propagation mechanism of GNN, we can find that, Existing GNNs have two common goals:

  • Encode useful information from the characteristics of nodes.
  • #Use the smoothing capabilities of topology.

So can we use mathematical language to describe these two goals? Someone proposed the following GNN optimization unified framework expressed by the formula:

Integrated multi-graph neural network

## The first item in the optimization goal:

Integrated multi-graph neural network

is the feature fitting item, and its goal is to make the learned nodes Indicates that Z is as close as possible to the original feature H, and F1, F2 is a graph convolution that can be designed freely nuclear. When the convolution kernel is the identity matrix I, it is equivalent to an all-pass filter; when the convolution kernel is a 0 matrix, it is a low-pass filter, and when the convolution kernel is the Laplacian matrix L, it is a high-pass filter.

The second term of the optimization goal is formally the trace of a matrix, and its function is the regular term on the graph. What is the difference between the trace and the regular term? What about relationships? In fact, the second item is expanded into the following form:

Integrated multi-graph neural network

##The meaning is that on the captured image The degree of difference in features between any two adjacent nodes represents the smoothness of a graph. Minimizing this goal is equivalent to making me and my neighbor more similar.

3. Use a unified optimization framework to understand existing GNN

Most GNNs are optimizing this goal. We can discuss it under different situations:

When the parameters are:

Integrated multi-graph neural network

When the optimization goal becomes:

Integrated multi-graph neural network

## Find the partial derivative and get:

Integrated multi-graph neural network

Let’s take the above The result can be further expanded to get:

Integrated multi-graph neural network

## Its meaning means the Kth layer All node representations are equal to the propagation process of the K-1th layer node representation on the adjacency matrix. After derivation until the end, it will be found that it is equal to the initial feature X propagating K times on the adjacency matrix after completing the feature transformation W*. In fact, this is the model of GCN or SGC with the nonlinear layer removed.

When the parameter F

1

=F2=I, ζ=1, ξ=1/α-1, α∈(0,q], and when selecting an all-pass filter, the optimization goal becomes:

Integrated multi-graph neural network

At this time, we also seek the partial derivative of Z, and make the partial derivative equal to 0 to obtain the closed-form solution of the optimization objective:

Integrated multi-graph neural network

By slightly converting the result, you can get the following formula:

Integrated multi-graph neural network

We can find that the above formula represents the process of node features propagating on personalized PageRank. That is the PPNP model.

Integrated multi-graph neural network

The same is such a model. If you use gradient descent to find it, and set The step size is b, and the iteration term is the partial derivative of the objective function at time k-1 with respect to Z.

Integrated multi-graph neural network

when

Integrated multi-graph neural network

When you can get:

Integrated multi-graph neural network

##This is the APPNP model. The background of the emergence of the APPNP model is that the inverse operation of the matrix in the PPNP model is too complicated, and APPNP uses iterative approximation to solve it. It can also be understood that APPNP can converge to PPNP because both come from the same framework.

Integrated multi-graph neural network

4. New GNN framework

As long as we design a new fitting term O##fit, and Design a corresponding graph regular term Oreg, and add a new solution process to get a new GNN model.

① Example 1: From all-pass filtering to low-pass filtering

As mentioned before, all-pass filtering Convolution kernel under the device F1##=F2=I , when the convolution kernel is the Laplacian matrix L It is a high pass filter. If the GNN obtained by weighting these two situations can encode low-pass information:

Integrated multi-graph neural network

##when

can get the exact solution:

Integrated multi-graph neural network

## Similarly, we can solve it iteratively:

Integrated multi-graph neural network

#5, Elastic GNN

The regular term mentioned in the previous unified framework is equivalent to the L2 regular, which is equivalent to the difference information between any two points on the calculation graph. Some researchers feel that L2 regularization is too global and will cause the smoothness of the entire graph to tend to be the same, which is not entirely consistent with reality. Therefore, it was proposed to add an L1 regular term. The L1 regular term would penalize relatively large changes in the graph.

Integrated multi-graph neural network

##The L1 regular term part is:

Integrated multi-graph neural network

In short, the unified framework above tells us:

    We can use a more macro perspective to understand GNN
  • We can start from this unified framework to design new GNN
However, this unified framework can only be adapted to homogeneous graph structures. Next, let’s take a look at the more common multi-relationship graph structure.

2. Relational GNN model

1, RGCN

## The so-called multi-relationship graph refers to a graph with an edge type greater than 1, as shown in the figure below.

Integrated multi-graph neural network

This kind of multi-relationship diagram is very widespread in the real world, such as in chemical molecules Multiple types of molecular bonds, such as the different relationships between people in a social relationship diagram. For such graphs we can use Relational Graph Neural Networks to model. The main idea is to individually aggregate graphs with N relationships to obtain N aggregation results, and then aggregate the N results.

Integrated multi-graph neural network

expressed as a formula:

Integrated multi-graph neural network

You can see that aggregation is performed in two steps. First, select a relationship r from all relationships R, and then find the relationship that contains this relationship. All nodes

Nr are aggregated, among which Wr is the weight used to weight various relationships. Therefore, it can be seen that as the number of relationships in the graph increases, the weight matrix Wr will also increase, which will lead to the problem of over-parameterization ( Over-parameterization). In addition, splitting the topological relationship diagram according to relationships can also lead to over-smoothing.

2, CompGCN

#In order to solve the problem of over-parameterization, CompGCN uses a vectorized relation encoder to replace N relations Matrix:

Integrated multi-graph neural network

The encoder contains relationships in three directions: forward, reverse, and self-loop:

Integrated multi-graph neural network

Every iteration will also update the embedding of the relation.

Integrated multi-graph neural network

But this heuristic design and such a parametric encoder are also Will cause excessive parameterization. Then, based on the above considerations, we get the starting point of our work: Can we design a more reliable GNN from the perspective of optimization goals, and at the same time solve the problems of existing GNNs.

3. Integrated Multi-Relationship Graph Neural Network

Our EMR GNN was published this year. Next, we will mainly focus on Three aspects are discussed on how we design a GCN suitable for multi-relationship graphs:

  • How to design a suitable integrated optimization algorithm
  • Message delivery mechanism
  • How the GNN model is designed

1. Integrated optimization algorithm

This optimization algorithm needs to meet two requirements:

  • Be able to capture multiple relationships on the graph at the same time
  • Be able to model the importance of different relationships on the graph

The integrated multi-relationship graph regular term we propose on the multi-relationship graph is as follows:

Integrated multi-graph neural network

This regular term is also to capture the smoothing ability of the graph signal, but this adjacency matrix is ​​captured under the relationship r and is subject to normalization Constrained parameters μr is to model the importance of a certain relationship. The second term is the second normal form regularization of the coefficient vector, which is to make the coefficient vector more uniform.

#In order to solve the problem of over-smoothing, we added a fitting term to ensure that the original feature information is not lost. The sum of the fitting term and the regular term is:

Integrated multi-graph neural network

and what was mentioned in the previous chapter Compared with the unified framework, the objective function we design here includes two variables: node correction Z and relationship matrix parameter μ. So it is also a challenge to derive a message propagation mechanism based on such an optimization goal.

2. Derivation of message passing mechanism

Here we adopt an iterative optimization strategy:

  • First fix the node representation Z, and then optimize the parameter μ
  • Then optimize the node based on the result μ of the previous iteration Represent Z

When the fixed node represents Z, the entire optimization objective degenerates into an objective function only related to μ, but this is a band Constrained objective function:

Integrated multi-graph neural network

##This is actually a simplex constraint (constraint in a standard A convex function of μ on simplex). For this type of problem, we can use the Mirror Entropic Descent Algorithm to solve it. We will first find a constant, and then update the weight coefficient under each relationship. The entire update process is similar to the exponential gradient descent algorithm.

Integrated multi-graph neural network

Fixed the relationship coefficient μ to update Z, then the optimization goal degenerates into the following This form:

Integrated multi-graph neural network

In this way we find the partial derivative of the objective function with respect to Z, and let If the partial derivative is equal to 0, we can get:

Integrated multi-graph neural network

##Then the closed-form solution of Z is:

Integrated multi-graph neural network

Similarly, we can use iteration to get an approximate solution. This process can be expressed as follows :

Integrated multi-graph neural network

From the derived message passing mechanism we can prove that the design can avoid excessive Smooth and avoid over-parameterization. Below we can look at the proof process.

The original multi-relationship PageRank matrix is ​​defined as follows:

Integrated multi-graph neural network

The personalized multi-relationship PageRank matrix adds a probability of returning its own node on this basis:

Integrated multi-graph neural network

By solving the above circular equation, you can get the multi-relationship personalized PageRank matrix:

Integrated multi-graph neural network

Let us:

You can get:

Integrated multi-graph neural network

##this That is the closed-form solution obtained by our proposed solution. That is to say, our propagation mechanism can be equivalent to propagating feature H on the personalized PageRank matrix of the node. Because in this propagation mechanism, a node can return to its own node with a certain probability, which means that its own information will not be lost during the information transmission process, thereby preventing the problem of over-smoothing.

In addition, our model also alleviates the phenomenon of over-parameterization, because as can be seen from the formula, for each relationship our model only has A learnable coefficient μr , Compared with the number of parameters of the previous encoder or weight matrix w#r, the magnitude of our parameters is almost negligible. The following figure is the model architecture we designed:

Integrated multi-graph neural network

where RCL is the parameter learning step, and the Pro step is the feature propagation step. These two steps together form our messaging layer. So how do we integrate our messaging layer into the DNN without introducing more additional parameters? We also follow the decoupling design idea: first use an MLP to extract input features, and then pass through multiple layers of message passing layers we designed. Superimposing multiple layers will not lead to over-smoothing. The final transfer result is processed by MLP to complete node classification and can be used for downstream tasks. Use the formula to express the above process as follows:

Integrated multi-graph neural network

##f(X;W) means that The input features are extracted through MLP, and the following EnMP(K) means that the extraction results are passed through the K layer message, gθ represents a classified MLP.

In back propagation we only need to update the parameters in the two MLPs, while the parameters in our EnMP are learned during the forward propagation process Yes, there is no need to update any parameters of EnMP during the backward propagation process.

We can compare the parameters of different mechanisms. We can see that the parameters of our EMR-GNN mainly come from the two MLPs before and after, and the relation coefficient. When the number of layers is greater than 3, we can know that the number of parameters of EMR-GNN is less than that of GCN, and even less than other heterogeneous graphs.

Integrated multi-graph neural network

With such a small number of parameters, our EMR-GNN operates at different nodes as follows: The best level can still be achieved under classification tasks.

Integrated multi-graph neural network

In addition, we also compared the changes in classification accuracy of different network structures after the number of layers increased. As shown in the figure below, when the number of layers increased to 64, our model can still maintain a high level Accuracy, while the original RGCN will encounter insufficient memory when the number of layers increases to more than 16 layers, and it is impossible to superimpose more layers. This is due to too many parameters. The performance of the GAT model is degraded due to over-smoothing.

Integrated multi-graph neural network

In addition, we also found that our EMR-GNN performed better under smaller data sizes. It can achieve the classification accuracy of the whole sample, while RGCN drops a lot.

We also analyzed whether the relationship coefficient μr learned by EMR-GNN is really meaningful, so what is meaningful? We hope that the relationship coefficient μr relationship coefficient Give more weight to important relationships and less weight to unimportant relationships. The results of our analysis are shown in the figure below:

Integrated multi-graph neural network

##The green histogram represents a The effect of classification under a certain relationship. If the classification accuracy is higher under a certain relationship, we can think that the relationship is important. The blue columns represent the relationship coefficients learned by our EMR-GNN. From the blue-green comparison, we can see that our relationship coefficient can reflect the importance of the relationship.

Finally we also made a visual display as shown below:

Integrated multi-graph neural network

You can see that the node embedding trained by EMR-GNN can carry the structured information of the node, which can make the nodes of the same type more cohesive and different types as much as possible. Separate, the segmentation boundary is clearer than other networks.

##4. Summary

1. We use a unified perspective to understand GNN

① From this perspective, we can easily see what problems there are with existing GNN

②This unified perspective can give us a way to redesign GNN Basics

#2. We try to design a new multi-relationship GNN from the perspective of the objective function

## ① We First, an integrated optimization framework was designed

## ② Based on such an optimization framework, we derived a message passing mechanism

③ This message passing mechanism with a small number of parameters is simply combined with MLP to get EMR-GNN

3. EMR-GNN has What are the benefits?

## ① It relies on a reliable optimization goal, so the results obtained are credible, and its underlying principles can be explained mathematically

② It can solve the over-smoothing problem of the existing Relation GNN

## ③ Solve the over-parameterization problem

④ Easy to train, better results can be obtained with smaller parameter quantities

5. Q&A session

Q1: Is there any difference between the learning of relationship coefficients and the attention mechanism?

A1: The relationship coefficient learning here is an update process derived through the optimization framework, and attention is a process that needs to be learned based on backpropagation, so in terms of optimization, both There is a fundamental difference.

#Q2: How applicable is the model on large-scale data sets?

A2: We analyzed the complexity of the model in the appendix. In terms of complexity, we are on the same level as RGCN, but the number of parameters will be smaller than RGCN. , so our model is even more suitable for large-scale data sets.

#Q3: Can this framework incorporate edge information?

#A3: Can be integrated into the fitting term or regular term.

#Q4: Where should I learn these mathematical basics?

#A4: Part of it is based on previous work, and the other part of the mathematical theory related to optimization is also based on some classic optimization papers.

#Q5: What is the difference between a relationship diagram and a heterogeneous diagram?

#A5: The relationship graph is a heterogeneous graph, but we usually think of heterogeneous graphs as those whose node types or edge types are greater than 1. In the relationship diagram, we are particularly concerned about the relationship category greater than 1. It can be understood that the latter includes the former.

Q6: Can mini-batch training be supported?

A6: Supported.

Q7: Is the future research direction of GNN more inclined to strict, interpretable mathematical derivation rather than heuristic design?

#A7: We ourselves feel that strict interpretable mathematical derivation is a reliable design method.

The above is the detailed content of Integrated multi-graph neural network. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete