Distributed control and estimation of multi-agent systems has received tremendous research attention in recent years due to their potential across many application domains [1], [2]. Here, the term "agent" can represent a sensor, an autonomous vehicle, or any general dynamical system. These multi-agent systems are becoming increasingly attractive because of their robustness against system failure, their ability to adapt to dynamic and uncertain environments, and their economic advantages compared to the implementation of more expensive monolithic systems.Formation control and network localization are two fundamental tasks for multi-agent systems that enable them to perform complex missions. The goal of formation control is to control each agent using local information from neighboring agents so that the entire team forms a desired spatial geometric pattern (see [2] for a recent survey on formation control). While the notion of a formation as a geometric pattern has a natural meaning for robotic systems, it may also correspond to more abstract configurations for the system state of a team of agents. The goal of network localization is to estimate the location of each agent in a network using locally sensed or communicated information from neighboring agents [3]-[6]. Network localization is usually the first step that must be completed before a sensor network provides other services like positioning mobile robots or monitoring areas of interest.For a formation control or network localization task, the type of information available to each agent is an important factor that determines the design of the corresponding control or estimation algorithms. Most of the existing approaches for formation control assume that each agent can obtain the relative positions of their nearest neighbors. In order to obtain relative positions in practice, each agent can measure their absolute positions using, for example, GPS, and then share their positions with their neighbors via wireless communications. This method is, however, not applicable when operating in GPS-denied environments such as indoors, underwater, or in deep space. Furthermore, the absolute accuracy of the GPS may not meet the requirements of high-accuracy formation control tasks. Rather than relying on external positioning systems such as GPS, each agent can use onboard sensors to sense their neighbors.Optical cameras are widely used onboard sensors for ground and aerial vehicles to achieve various sensing tasks due to their characteristics of being low-cost, light-weight, and low-power. It is notable that optical cameras are inherently bearing-only sensors. Specifically, once a target has been recognized in an image, its bearing relative to the camera can be calculated immediately from its pixel coordinate based on the pin-hole camera model [7, Section 3.3]. As a comparison, the range from the target to the camera is more complicated to obtain because it requires additional geometric information of the target and extra estimation algorithms, which may significantly in...