Home  >  Article  >  Backend Development  >  How to write K-means clustering algorithm in Python?

How to write K-means clustering algorithm in Python?

WBOY
WBOYOriginal
2023-09-21 11:06:25853browse

How to write K-means clustering algorithm in Python?

How to write K-means clustering algorithm in Python?

K-means clustering algorithm is a commonly used data mining and machine learning algorithm that can classify and cluster a set of data according to its attributes. This article will introduce how to write the K-means clustering algorithm in Python and provide specific code examples.

Before we start writing code, we need to understand the basic principles of K-means clustering algorithm.

The basic steps of K-means clustering algorithm are as follows:

  1. Initialize k centroids. The centroid refers to the center point of the cluster, and each data point is assigned to the category represented by its nearest centroid.
  2. Assign each data point to the category represented by the nearest centroid based on its distance from the centroid.
  3. Update the position of the centroid, setting it to the average of all data points in that category.
  4. Repeat steps 2 and 3 until the position of the center of mass no longer changes.

Now we can start writing code.

Import the necessary libraries

First, we need to import the necessary libraries, such as numpy and matplotlib.

import numpy as np
import matplotlib.pyplot as plt

Data preparation

We need to prepare a set of data for clustering. Here we use numpy to randomly generate a set of two-dimensional data.

data = np.random.randn(100, 2)

Initializing centroids

We need to initialize k centroids for the clustering algorithm. Here we use numpy to randomly select k data points as the initial centroid.

k = 3
centroids = data[np.random.choice(range(len(data)), k, replace=False)]

Calculate distance

We need to define a function to calculate the distance between the data point and the centroid. Here we use Euclidean distance.

def compute_distances(data, centroids):
    return np.linalg.norm(data[:, np.newaxis] - centroids, axis=2)

Assign data points to the nearest centroid

We need to define a function to assign each data point to the category represented by the nearest centroid.

def assign_clusters(data, centroids):
    distances = compute_distances(data, centroids)
    return np.argmin(distances, axis=1)

Update the position of the centroid

We need to define a function to update the position of the centroid, that is, set it to the average of all data points in the category.

def update_centroids(data, clusters, k):
    centroids = []
    for i in range(k):
        centroids.append(np.mean(data[clusters == i], axis=0))
    return np.array(centroids)

Iterative clustering process

Finally, we need to iterate the clustering process until the position of the centroid no longer changes.

def kmeans(data, k, max_iter=100):
    centroids = data[np.random.choice(range(len(data)), k, replace=False)]
    for _ in range(max_iter):
        clusters = assign_clusters(data, centroids)
        new_centroids = update_centroids(data, clusters, k)
        if np.all(centroids == new_centroids):
            break
        centroids = new_centroids
    return clusters, centroids

Run the clustering algorithm

Now we can run the clustering algorithm to get the category to which each data point belongs and the final centroid.

clusters, centroids = kmeans(data, k)

Visualizing results

Finally, we can use matplotlib to visualize the results. Each data point is color-coded according to the category it belongs to, and the location of the centroid is indicated by a red circle.

plt.scatter(data[:, 0], data[:, 1], c=clusters)
plt.scatter(centroids[:, 0], centroids[:, 1], s=100, c='red', marker='o')
plt.show()

Through the above code examples, we can use Python to implement the K-means clustering algorithm. You can adjust the number of clusters k and other parameters according to your needs. I hope this article will help you understand and implement the K-means clustering algorithm!

The above is the detailed content of How to write K-means clustering algorithm in Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn