One of the fundamental requirements for visual surveillance using nonoverlapping camera networks is the correct labeling of tracked objects on each camera in a consistent way, in the sense that the observations of the same object at different cameras should be assigned with the same label. In this paper, we formulate this task as a Bayesian inference problem and propose a distributed inference framework in which the posterior distribution of labeling variable corresponding to each observation is calculated based solely on local information processing on each camera and mutual information exchanging between neighboring cameras. In our framework, the number of objects presenting in the monitored region does not need to be specified beforehand. Instead, it can be determined automatically on the fly. In addition, we make no assumption about the appearance distribution of a single object, but use "similarity" scores between appearance pairs as appearance likelihood for inference. To cope with the problem of missing detection, we consider an enlarged neighborhood of each camera during inference and use a mixture model to describe the higher order spatiotemporal constraints. Finally, we demonstrate the effectiveness of our method through experiments on an indoor office building dataset and an outdoor campus garden dataset.