Multi-object tracking aims to recover the object trajectories, given multiple detections in video frames. Object feature extraction and similarity metric are the two keys to reliably associate trajectories. In this paper, we propose the recurrent metric network (RMNet), a convolutional neural network-recurrent neural network-based similarity metric framework for the multi-object tracking. Given a reference object, the RMNet takes as input random positive and negative detections and outputs similarity scores over time. The RMNet handles the long-term temporal object variations and false object detections by its long-short memory units. With the scores from RMNet, we introduce a batch multiple hypothesis (BMH) strategy, a simple yet efficient data association method for the batch multi-object tracking. BMH generates a hypothesis tree for each object with a dual-threshold hypothesis generation approach and, then, selects the best branch (or hypothesis) for each object as the batch tracking result. Specially, we model the hypothesis selection as a 0-1 programming problem and introduce a reward function to re-find the objects in case of missing detection. We evaluate our RMNet and BMH strategy on several popular datasets: 2DMOT2015, PETS2009, TUD, and KITTI. We achieve a performance comparable or superior to those of the state-of-theart methods.INDEX TERMS Multi-object tracking, pedestrians tracking, recurrent metric networks.