Home >Backend Development >Python Tutorial >Clustering algorithm based on projection on convex sets (POCS)
POCS: Projections onto Convex Sets. In mathematics, a convex set is a set in which any line segment between any two points is within the set. Projection is the operation of mapping a point to a subspace in another space. Given a convex set and a point, you can operate by finding the projection of the point onto the convex set. The projection is the point in the convex set that is closest to the point and can be calculated by minimizing the distance between this point and any other point in the convex set. Since it is a projection, we can map the features to a convex set in another space, so that operations such as clustering or dimensionality reduction can be performed.
This article reviews a clustering algorithm based on the convex set projection method, that is, a clustering algorithm based on POCS. The original paper was published at IWIS2022.
A convex set is defined as a set of data points, in which the line segment connecting any two points x1 and x2 in the set is completely included in this set. According to the definition of convex set, the empty set ∅, unitary set, line segment, hyperplane, and Euclidean sphere are all considered to be convex sets. A data point is also considered a convex set because it is a singleton set (a set with only one element). This opens up a new path for applying the concept of POCS to clustered data points.
POCS methods can be roughly divided into two types: alternating and parallel.
Starting from any point in the data space, the alternating projection from this point to two (or more) intersecting convex sets will converge to the intersection point of the sets. One point, such as the following figure:
When the convex sets do not intersect, the alternating projection will converge to greedy limit cycles that depend on the projection order.
Different from the alternating form, parallel POCS projects from data points to all convex sets simultaneously, and each projection All have an importance weight. For two nonempty intersecting convex sets, similar to the alternating version, the parallel projection converges to a point at the intersection of the sets.
In the case where the convex sets do not intersect, the projection will converge to a minimum solution. The main idea of the POCs-based clustering algorithm comes from this feature.
For more details about POCS, you can view the original paper
Using the convergence of the parallel POCS method property, the author of the paper proposed a very simple but effective clustering algorithm to a certain extent. The algorithm works similarly to the classic K-Means algorithm, but there are differences in the way each data point is processed: the K-Means algorithm weights the importance of each data point the same, but the POCs-based clustering algorithm Each data point is weighted differently in importance, which is proportional to the distance of the data point from the cluster prototype.
The pseudo code of the algorithm is as follows:
The author tested the POCs-based algorithm on some public benchmark data sets Performance of clustering algorithms. The table below summarizes the descriptions of these datasets.
The authors compared the performance of the POCs-based clustering algorithm with other traditional clustering methods, including k-means and fuzzy c-means algorithms. The following table summarizes the evaluation in terms of execution time and clustering error.
The clustering results are shown below:
We use this algorithm on a very simple data set. The author has released a package for direct use. For applications, we can use it directly:
pip install pocs-based-clustering
Create a simple data set of 5000 data points centered on 10 clusters:
# Import packages import time import matplotlib.pyplot as plt from sklearn.datasets import make_blobs from pocs_based_clustering.tools import clustering # Generate a simple dataset num_clusters = 10 X, y = make_blobs(n_samples=5000, centers=num_clusters, cluster_std=0.5, random_state=0) plt.figure(figsize=(8,8)) plt.scatter(X[:, 0], X[:, 1], s=50) plt.show()
Perform clustering and display the results:
# POSC-based Clustering Algorithm centroids, labels = clustering(X, num_clusters, 100) # Display results plt.figure(figsize=(8,8)) plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='viridis') plt.scatter(centroids[:, 0], centroids[:, 1], s=100, c='red') plt.show()
我们简要回顾了一种简单而有效的基于投影到凸集(POCS)方法的聚类技术,称为基于POCS的聚类算法。该算法利用POCS的收敛特性应用于聚类任务,并在一定程度上实现了可行的改进。在一些基准数据集上验证了该算法的有效性。
论文的地址如下:https://arxiv.org/abs/2208.08888
作者发布的源代码在这里:https://github.com/tranleanh/pocs-based-clustering
The above is the detailed content of Clustering algorithm based on projection on convex sets (POCS). For more information, please follow other related articles on the PHP Chinese website!