A novel methodology is proposed herein to estimate the three-dimensional (3D) surface shape of unknown, markerless deforming objects through a modular multi-camera vision system. The methodology is a generalized formal approach to shape estimation for a priori unknown objects. Accurate shape estimation is accomplished through a robust, adaptive particle filtering process. The estimation process yields a set of surface meshes representing the expected deformation of the target object. The methodology is based on the use of a multi-camera system, with a variable number of cameras, and range of object motions. The numerous simulations and experiments presented herein demonstrate the proposed methodology's ability to accurately estimate the surface deformation of unknown objects, as well as its robustness to object loss under self-occlusion, and varying motion dynamics.Typical motion capture methods utilize an articulated object model (i.e., a skeleton model) that is fit to the recovered 3D data [34][35][36][37][38]. The deformation of articulated objects is defined as the change in pose and orientation of the articulated links in the object. The accuracy of these methods is quantified by the angular joint error between the recovered model and ground truth. Markered motion capture methods yield higher-resolution shape recovery compared to articulated-object based methods, but depend on engineered surface features [3,39]. Several markerless motion-capture methods depend on off-line user-assisted processing for model generation [40][41][42]. High-resolution motion-capture methods fit a known mesh model to the capture data to improve accuracy [10,43,44]. Similarly, the known object model and material properties can be used to further improve the shape recovery [45]. The visual hull technique can create a movie-strip motion capture sequence of objects [8,[46][47][48] or estimate the 3D background by removing the dynamic objects in the scene [49]. All these methods require some combination of off-line processing, a priori known models, and constrained workspaces to produce a collection of models at each demand instant resulting in a movie-strip representation. Deformation estimation, however, is absent.Multi-camera deformation-estimation methods commonly implement either a Kalman filter (KF) [50-53], particle filter [54], or particle swarm optimization (PSO) [55] to track an object in a motion-capture sequence. Articulated-motion deformation prediction methods rely on a skeleton model of the target object. KFs were successfully implemented to estimate joint deformation for consecutive demand instants [2,13]. PSO-base methods were also shown to be successful in deformation estimation for articulated objects [56,57]. Mesh models combined with a KF tracking process produce greater surface accuracy for deformation estimation [3,14]. Patch-based methods track independent surface patches through an extended Kalman filter (EKF) [58], and a particle filter [59], producing deformation estimations of each patch. Many active-vision ...