Optical motion capturing explains the three-Dimensional (3D) position estimation of points through triangulation employing several depth cameras. Prosperous performance relies on level of visibility of points from different cameras and the overlap of captured meshes in-between. Generally, the accuracy of the estimation is practically based on the camera parameters e.g., location and orientations. Accordingly, the camera network configurations play a key role in the quality of the estimated mesh. This paper proposes an optimal approach for camera placement based on characteristics of a depth camera D435i - Intel RealSense. The optimal problem includes a cost function that contains several minimisation and maximisation terms. The minimisation terms are distance of the cameras to the center of the scanning object, resolution error, and sparsity. And the maximisation terms are distance between each two pair of cameras, percent of captured point from an object, and the level of overlap between cameras. The object is designed based on practical experiments of human walking and is a bounding box around one step of dynamic foot work-space from heel strike posture to toe-off posture. The accuracy and robustness of the algorithms are assessed via experiment measurement, and sensitivity to the number of cameras is investigated. Accordingly, the experiment results determined that the scanning accuracy can be as high as 2.5 % based on a reference scan with a high-end scanner (Artec Eva).