Many algorithms designed to accelerate the Fuzzy c-Means (FCM) clustering
algorithm randomly sample the data. Typically, no statistical method is used to
estimate the subsample size, despite the impact subsample sizes have on speed
and quality. This paper introduces two new accelerated algorithms, GOFCM and
MSERFCM, that use a statistical method to estimate the subsample size. GOFCM, a
variant of SPFCM, also leverages progressive sampling. MSERFCM, a variant of
rseFCM, gains a speedup from improved initialization. A general, novel stopping
criterion for accelerated clustering is introduced. The new algorithms are
compared to FCM and four accelerated variants of FCM. GOFCM's speedup was 4-47
times that of FCM and faster than SPFCM on each of the six datasets used in
experiments. For five of the datasets, partitions were within 1% of those of
FCM. MSERFCM's speedup was 5-26 times that of FCM and produced partitions within
3% of those of FCM on all datasets. A unique dataset, consisting of plankton
images, exposed the strengths and weaknesses of many of the algorithms tested.
It is shown that the new stopping criterion is effective in speeding up
algorithms such as SPFCM and the final partitions are very close to those of
FCM.