


Rethinking anomaly detection based on structured data: What kind of graph neural network do we need?
Paper address: https://arxiv.org/abs/2205.15508
Code address: https://github. com/squareRoot3/Rethinking-Anomaly-Detection
Anomaly Detection for Structured Graph Data: Background and Challenges
Anomaly detection is one of the classic tasks of data mining. Analyzing abnormal data can help companies or users understand the formation mechanism behind it, so as to make corresponding decisions and avoid losses. With the development of the Internet, anomaly detection for structured data, namely graph anomaly detection, has received more and more attention.
Graph anomaly detection can be specifically defined as: finding a small number of objects on the graph (nodes, edges, subgraphs, etc.), which have different distribution patterns from most other objects. This article focuses on the detection task of abnormal nodes on the graph. Compared with traditional anomaly detection methods, graph anomaly detection can make use of the associated information between different entities to better serve actual scenarios such as network security, fraud detection, troll detection, financial risk control, and fault monitoring.
The following figure visually compares the difference between traditional anomaly detection and graph-oriented anomaly detection tasks.
#Figure 1: Comparison of traditional anomaly detection and graph-oriented anomaly detection tasks.
In recent years, graph neural networks have become a powerful tool for analyzing and processing structured data. Graph neural networks learn embedding representations that contain the node's own characteristics and neighbor information to better complete downstream tasks such as classification, reconstruction, and regression.
However, general graph neural networks (such as convolutional networks, etc.) are mainly designed for normal data, and are prone to encounter the "over-smoothing" problem in anomaly detection tasks, that is, abnormal nodes and The expression of normal nodes is difficult to distinguish, which affects the accuracy of abnormal detection. For example, in the practical application of financial fraud detection, abnormal accounts usually disguise themselves by conducting normal transactions with multiple normal accounts to reduce their suspiciousness, and then conduct illegal transactions. This “relationship fraud” further increases the difficulty of graph anomaly detection.
In order to solve the above difficulties, researchers specially proposed Graph neural network model for anomaly detection tasks, including (1) using the attention mechanism to aggregate neighborhood information from multiple views; (2) ) Use resampling methods to aggregate neighborhood information of different categories; (3) Design additional loss functions to assist in the training of graph neural networks, etc. These methods mainly design graph neural networks to handle anomalies from the perspective of the spatial domain, but no one has considered this problem from the perspective of the spectral domain.
It turns out that choosing different spectral filters will affect the expressive ability of graph neural networks, resulting in differences in performance.
A new approach: Graph anomaly detection from a spectral domain perspective
In order to fill the gap in existing research, this article hopes to answer such a question: How to tailor a spectral filter for graph neural networks? abnormal detection?
This article attempts for the first time to analyze the abnormal data on the graph from the spectral domain perspective and observes that: abnormal data will cause the spectrum energy to "shift to the right", that is, the energy will be less concentrated in low frequencies. , while focusing more on high frequencies.
In order to visualize this right shift phenomenon, the researchers first randomly generated a Barabási–Albert graph (BA graph) with 500 nodes, and assumed that the attributes of normal nodes and abnormal nodes on the graph respectively follow two A different Gaussian distribution, where the variance of outlier nodes is larger.
The upper part of the picture shows the distribution of data containing different degrees of anomalies on the BA chart, while the lower part shows the corresponding spectral energy distribution. Among them, the histogram represents the energy proportion of the corresponding spectrum interval, and the line graph represents the cumulative proportion of frequency domain energy from zero to that point.
#Figure 2: Visualization of the phenomenon of “right shift” of spectral energy.
As can be seen from the above figure, when the proportion of abnormal data is 0%, most of the energy is concentrated in the low frequency part (λ
In actual scenarios, abnormal data usually follows a more complex distribution. On four large-scale graph anomaly detection data sets, researchers also confirmed the existence of the “right shift” phenomenon. The Amazon abnormal user detection data set in the figure below is an example. When a part of the abnormal nodes in the data are deleted, the low-frequency energy on the spectrum increases significantly, while the high-frequency energy decreases accordingly. If the same number of random nodes are deleted, the energy distribution of the spectrum hardly changes. This further verifies that abnormal data is the key to the "right shift" of spectrum energy.
Figure 3: The impact of deleting different nodes on the spectrum energy distribution on the Amazon abnormal user detection data set: the original picture (The Original), deleting random nodes (Drop -Random), delete abnormal nodes (Drop-Anomaly)
New tool for graph anomaly detection: Beta wavelet graph neural network
The analysis in the previous section shows that when detecting graph anomalies, Need to pay attention to the "right shift" effect. For example, in the Amazon data set above, the spectrum information near the eigenvalue λ=1 is closely related to abnormal data. In order to better capture abnormal information, the graph neural network needs to have the properties of a band-pass filter, retaining only signals near λ=1 while filtering the remaining signals.
Unfortunately, most of the existing graph neural networks are low-pass filters or adaptive filters, which cannot guarantee band-pass properties. Although the adaptive filter has the ability to fit any function, it may also degenerate into a low-pass filter in anomaly detection. This is because in the entire data set, the high-frequency information corresponding to the abnormal data accounts for a small proportion, while most of the spectrum energy is still concentrated in low frequencies.
In order to better handle the "right shift" caused by abnormal data, researchers have proposed a new method of graph anomaly detection - Beta Wavelet Graph Neural Network (BWGNN). By drawing on Hammond's graph wavelet theory, they designed a new wavelet kernel based on the Beta function as a spectral filter for the graph neural network.
Compared with the commonly used Heat Kernel function, the Beta function as a wavelet kernel not only meets the requirements of a band-pass filter, but also has better frequency domain locality and spatial domain locality. The figure below compares the difference between thermokernel wavelet and beta kernel wavelet.
Figure 4: Comparison of thermal kernel wavelet and Beta kernel wavelet in the spectral domain (left) and spatial domain (right), the Beta function has better band General and local properties.
This article verified the performance of BWGNN on four large-scale graph anomaly detection data sets. Among them, the Yelp data set is used to detect abnormal comments on dianping websites, the Amazon data set is used to detect abnormal users on e-commerce platforms, the T-Finance data set is used to detect abnormal users on transaction networks, and the T-Social data set is used to detect abnormal users on social networks, including up to Five million nodes and 70 million edges.
As can be seen from the table below, compared with traditional classification models, general graph neural networks and specialized graph anomaly detection models, BWGNN performs in two scenarios: 40% training data and 1% training data (semi-supervised). achieve better results. In terms of operating efficiency, BWGNN is close to the time consumption of most general graph neural networks and is more efficient than other graph anomaly detection models.
Summary
In this article, the researchers found that the appearance of abnormal nodes on the graph will cause the spectrum energy to "shift to the right" ”, which provides a new perspective for anomaly detection on structured data. Based on this finding, this paper proposes a new tool for graph anomaly detection—Beta Wavelet Graph Neural Network (BWGNN). It captures the high-frequency anomaly information generated by "right shifting" through a specially designed band-pass filter, and achieves optimal results on multiple data sets.
In actual implementation, graph anomaly detection is usually a complex system engineering, but choosing an appropriate graph neural network is a key factor affecting system performance. The BWGNN proposed by the researchers has a streamlined design, low complexity, and is easy to replace. It is a new choice for graph neural networks.
The above is the detailed content of Rethinking anomaly detection based on structured data: What kind of graph neural network do we need?. For more information, please follow other related articles on the PHP Chinese website!

Harnessing the Power of Data Visualization with Microsoft Power BI Charts In today's data-driven world, effectively communicating complex information to non-technical audiences is crucial. Data visualization bridges this gap, transforming raw data i

Expert Systems: A Deep Dive into AI's Decision-Making Power Imagine having access to expert advice on anything, from medical diagnoses to financial planning. That's the power of expert systems in artificial intelligence. These systems mimic the pro

First of all, it’s apparent that this is happening quickly. Various companies are talking about the proportions of their code that are currently written by AI, and these are increasing at a rapid clip. There’s a lot of job displacement already around

The film industry, alongside all creative sectors, from digital marketing to social media, stands at a technological crossroad. As artificial intelligence begins to reshape every aspect of visual storytelling and change the landscape of entertainment

ISRO's Free AI/ML Online Course: A Gateway to Geospatial Technology Innovation The Indian Space Research Organisation (ISRO), through its Indian Institute of Remote Sensing (IIRS), is offering a fantastic opportunity for students and professionals to

Local Search Algorithms: A Comprehensive Guide Planning a large-scale event requires efficient workload distribution. When traditional approaches fail, local search algorithms offer a powerful solution. This article explores hill climbing and simul

The release includes three distinct models, GPT-4.1, GPT-4.1 mini and GPT-4.1 nano, signaling a move toward task-specific optimizations within the large language model landscape. These models are not immediately replacing user-facing interfaces like

Chip giant Nvidia said on Monday it will start manufacturing AI supercomputers— machines that can process copious amounts of data and run complex algorithms— entirely within the U.S. for the first time. The announcement comes after President Trump si


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

Dreamweaver Mac version
Visual web development tools

Notepad++7.3.1
Easy-to-use and free code editor