The data functions that are studied in the course of functional data
analysis are assembled from discrete data, and the level of smoothing that is
used is generally that which is appropriate for accurate approximation of the
conceptually smooth functions that were not actually observed. Existing
literature shows that this approach is effective, and even optimal, when using
functional data methods for prediction or hypothesis testing. However, in the
present paper we show that this approach is not effective in classification
problems. There a useful rule of thumb is that undersmoothing is often
desirable, but there are several surprising qualifications to that approach.
First, the effect of smoothing the training data can be more significant than
that of smoothing the new data set to be classified; second, undersmoothing is
not always the right approach, and in fact in some cases using a relatively
large bandwidth can be more effective; and third, these perverse results are the
consequence of very unusual properties of error rates, expressed as functions of
smoothing parameters. For example, the orders of magnitude of optimal smoothing
parameter choices depend on the signs and sizes of terms in an expansion of
error rate, and those signs and sizes can vary dramatically from one setting to
another, even for the same classifier.