Automated optical tweezers-based robotic manipulation of microscale objects requires real-time visual perception for estimating the states, i.e., positions and orientations, of the objects. Such visual perception is particularly challenging in heterogeneous environments comprising mixtures of biological and colloidal objects, such as cells and microspheres, when the popular imaging modality of low contrast bright field microscopy is used. In this paper, we present an accurate method to address this challenge. Our method combines many well-established image processing techniques such as blob detection, histogram equalization, erosion, and dilation with a convolutional neural network in a novel manner. We demonstrate the effectiveness of our processing pipeline in perceiving objects of both regular and irregular shapes in heterogeneous microenvironments of varying compositions. The neural network, in particular, helps in distinguishing the individual microspheres present in dense clusters.