Consider a distributed detection problem in which the underlying distributions of the observations are unknown; instead of these distributions, noisy versions of empirically observed statistics are available to the fusion center. These empirically observed statistics, together with source (test) sequences, are transmitted through different channels to the fusion center. The fusion center decides which distribution the source sequence is sampled from based on these data. For the binary case, we derive the optimal type-II error exponent given that the type-I error decays exponentially fast. The type-II error exponent is maximized over the proportions of channels for both source and training sequences. We conclude that as the ratio of the lengths of training to test sequences α tends to infinity, using only one channel is optimal. By calculating the derived exponents numerically, we conjecture that the same is true when α is finite under certain conditions. We relate our results to the classical distributed detection problem studied by Tsitsiklis, in which the underlying distributions are known. Finally, our results are extended to the case of m-ary distributed detection with a rejection option.