Merging K-means solutions for clustering large datasets
The K-means algorithm is one of the most popular clustering procedures due to its computational speed and intuitive construction. Unfortunately, the application of K-means in its traditional form based on Euclidean distances is limited to cases with spherical clusters of approximately equal size. At the same time, it is a common practice to use the algorithm without checking the underlying assumption leading to meaningless or misleading solutions. We propose merging solutions obtained by K-means to produce meaningful groupings. The notion of pairwise overlap is used to measure the closeness of the groups in the obtained solution. The ideas are illustrated through examples and real data with good results.