9. Cluster Analysis

Hans-Joachim Mucha and Hizir Sofyan
18 November 2003

As an explorative technique, cluster analysis provides a description or a reduction in the dimension of the data. It classifies a set of observations into two or more mutually exclusive unknown groups based on combinations of many variables. Its aim is to construct groups in such a way that the profiles of objects in the same groups are relatively homogenous whereas the profiles of objects in different groups are relatively heteregoneous.

Clustering is distinct from classification techniques, like discriminant analysis or classification tree algorithms. Here no a priori information about classes is required, i.e., neither the number of clusters nor the rules of assignment into clusters are known. They have to be discovered exclusively from the given data set without any reference to a training set. Cluster analysis allows many choices about the nature of the algorithm for combining groups.