Home >Backend Development >Python Tutorial >Clustering algorithm based on projection on convex sets (POCS)

Clustering algorithm based on projection on convex sets (POCS)

PHPz
PHPzforward
2023-04-30 12:22:061509browse

POCS: Projections onto Convex Sets. In mathematics, a convex set is a set in which any line segment between any two points is within the set. Projection is the operation of mapping a point to a subspace in another space. Given a convex set and a point, you can operate by finding the projection of the point onto the convex set. The projection is the point in the convex set that is closest to the point and can be calculated by minimizing the distance between this point and any other point in the convex set. Since it is a projection, we can map the features to a convex set in another space, so that operations such as clustering or dimensionality reduction can be performed.

This article reviews a clustering algorithm based on the convex set projection method, that is, a clustering algorithm based on POCS. The original paper was published at IWIS2022.

convex set

A convex set is defined as a set of data points, in which the line segment connecting any two points x1 and x2 in the set is completely included in this set. According to the definition of convex set, the empty set ∅, unitary set, line segment, hyperplane, and Euclidean sphere are all considered to be convex sets. A data point is also considered a convex set because it is a singleton set (a set with only one element). This opens up a new path for applying the concept of POCS to clustered data points.

Projection of Convex Sets (POCS)

POCS methods can be roughly divided into two types: alternating and parallel.

1. Alternating poc

Starting from any point in the data space, the alternating projection from this point to two (or more) intersecting convex sets will converge to the intersection point of the sets. One point, such as the following figure:

Clustering algorithm based on projection on convex sets (POCS)

When the convex sets do not intersect, the alternating projection will converge to greedy limit cycles that depend on the projection order.

Clustering algorithm based on projection on convex sets (POCS)

2. Parallel POCS

Different from the alternating form, parallel POCS projects from data points to all convex sets simultaneously, and each projection All have an importance weight. For two nonempty intersecting convex sets, similar to the alternating version, the parallel projection converges to a point at the intersection of the sets.

Clustering algorithm based on projection on convex sets (POCS)

In the case where the convex sets do not intersect, the projection will converge to a minimum solution. The main idea of ​​the POCs-based clustering algorithm comes from this feature.

Clustering algorithm based on projection on convex sets (POCS)

For more details about POCS, you can view the original paper

Clustering algorithm based on pocs

Using the convergence of the parallel POCS method property, the author of the paper proposed a very simple but effective clustering algorithm to a certain extent. The algorithm works similarly to the classic K-Means algorithm, but there are differences in the way each data point is processed: the K-Means algorithm weights the importance of each data point the same, but the POCs-based clustering algorithm Each data point is weighted differently in importance, which is proportional to the distance of the data point from the cluster prototype.

The pseudo code of the algorithm is as follows:

Clustering algorithm based on projection on convex sets (POCS)

Experimental results

The author tested the POCs-based algorithm on some public benchmark data sets Performance of clustering algorithms. The table below summarizes the descriptions of these datasets.

Clustering algorithm based on projection on convex sets (POCS)

The authors compared the performance of the POCs-based clustering algorithm with other traditional clustering methods, including k-means and fuzzy c-means algorithms. The following table summarizes the evaluation in terms of execution time and clustering error.

Clustering algorithm based on projection on convex sets (POCS)

Clustering algorithm based on projection on convex sets (POCS)

The clustering results are shown below:

Clustering algorithm based on projection on convex sets (POCS)

Example code

We use this algorithm on a very simple data set. The author has released a package for direct use. For applications, we can use it directly:

pip install pocs-based-clustering

Create a simple data set of 5000 data points centered on 10 clusters:

# Import packages
 import time
 import matplotlib.pyplot as plt
 
 from sklearn.datasets import make_blobs
 from pocs_based_clustering.tools import clustering
 
 
 # Generate a simple dataset
 num_clusters = 10
 X, y = make_blobs(n_samples=5000, centers=num_clusters, 
 cluster_std=0.5, random_state=0)
 
 plt.figure(figsize=(8,8))
 plt.scatter(X[:, 0], X[:, 1], s=50)
 plt.show()

Clustering algorithm based on projection on convex sets (POCS)

Perform clustering and display the results:

# POSC-based Clustering Algorithm
 centroids, labels = clustering(X, num_clusters, 100)
 
 # Display results
 plt.figure(figsize=(8,8))
 plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='viridis')
 plt.scatter(centroids[:, 0], centroids[:, 1], s=100, c='red')
 plt.show()

Clustering algorithm based on projection on convex sets (POCS)

总结

我们简要回顾了一种简单而有效的基于投影到凸集(POCS)方法的聚类技术,称为基于POCS的聚类算法。该算法利用POCS的收敛特性应用于聚类任务,并在一定程度上实现了可行的改进。在一些基准数据集上验证了该算法的有效性。

论文的地址如下:https://arxiv.org/abs/2208.08888

作者发布的源代码在这里:https://github.com/tranleanh/pocs-based-clustering

The above is the detailed content of Clustering algorithm based on projection on convex sets (POCS). For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete