当前位置：首页 > news >正文

K-means算法的python实现

news 2025/7/12 16:21:51

K-means算法步骤

初始化质心：输入初始的质心位置。
分配样本：将每个数据点分配到离它最近的质心对应的簇中。
更新质心：对每个簇中的所有数据点，计算它们的均值，并将均值更新为新的质心。
重复步骤2和3，直到质心收敛（即质心不再改变或改变很小）。

K-means聚类代码实现

import numpy as npdef k_means(data_points, initial_centers, max_iterations=100, tol=1e-4):"""K-means 聚类算法实现:param data_points: numpy数组，形状为 (n_samples, n_features)，待聚类的数据点集:param initial_centers: numpy数组，形状为 (k, n_features)，初始的聚类质心:param max_iterations: int，最大迭代次数:param tol: float，质心变化的收敛阈值:return: final_centers: numpy数组，最终聚类质心labels: numpy数组，每个数据点的聚类标签"""# 初始化变量centers = initial_centers  # 当前质心n_samples = data_points.shape[0]  # 数据点个数k = initial_centers.shape[0]  # 聚类数labels = np.zeros(n_samples)  # 每个数据点所属的簇标签for iteration in range(max_iterations):# 1. 分配样本到最近的质心for i, point in enumerate(data_points):distances = np.linalg.norm(point - centers, axis=1)  # 计算与所有质心的欧氏距离labels[i] = np.argmin(distances)  # 找到最近质心的索引# 2. 更新质心new_centers = np.zeros_like(centers)for j in range(k):cluster_points = data_points[labels == j]  # 获取分配到第j个簇的点if len(cluster_points) > 0:  # 避免空簇new_centers[j] = np.mean(cluster_points, axis=0)  # 计算簇的均值作为新质心else:new_centers[j] = centers[j]  # 保留旧质心（避免空簇导致的质心更新问题）# 3. 判断是否收敛（质心变化是否小于阈值）center_shift = np.linalg.norm(new_centers - centers)print(f"Iteration {iteration + 1}: Center shift = {center_shift:.6f}")if center_shift < tol:print("Converged!")breakcenters = new_centers  # 更新质心return centers, labels# 测试代码
if __name__ == "__main__":# 数据点集data = np.array([[1, 2], [1, 4], [1, 0],[10, 2], [10, 4], [10, 0]])# 初始聚类中心initial_centers = np.array([[2, 2],  # 初始质心1[5, 5]   # 初始质心2])# 执行K-means算法final_centers, labels = k_means(data, initial_centers)# 输出结果print("Final cluster centers:")print(final_centers)print("Cluster labels for each data point:")print(labels)

查看全文

http://www.lryc.cn/news/505321.html