Current Activities |
K-Means Clustering and
Fuzzy Clustering
are different than
Hierarchical Clustering and
Diversity Selection in that the number of clusters, K, needs to be
determined at the onset. The goal is to divide the objects into K
clusters such that some metric relative to the centroids of the
clusters is minimized. If the clustering is over binary objects,
medoids need to be used instead of centroids. A medoid is is just a
bit string that minimizes the sum of distances to all objects in the
cluster [1] (see Clustering Binary Objects for
further details).
Various metrics to the centroids that can be minimized include
The metric to minimize and the choice of a distance measure will determine the shape of the optimium clusters. Two procedures are available to search for the optimum set of clusters. The first assigns each object to a cluster and the second sets initial positions for the cluster centroids. In the first procedure, the objects are randomly assigned to one of the K clusters. Once this is done, the position of the K centroids are determined, as is the value of the metric to minimize. A global optimization method is then used to reassign some of the objects to different clusters. New centroids are determined, as is the metric to minimize. This procedure is continued until (hopefully) the optimum assignment of objects to clusters is found. In the second procedure for K-means clustering, placement of the K centroids can be done by the following procedure.
A global optimization method is then used to move the position of one or more of the centroids. The above procedure is repeated and new metric is determined. The object is to move the centroids into a position such that an optimum separation of objects into groups occurrs. This latter method of K-means clustering is the one that has to be used if you are clustering binary objects.
Reference: |