Providing mobile robots with the ability to manipulate objects has, despite decades of research, remained a challenging problem. The problem is approachable in constrained environments where there is ample prior knowledge of the environment layout and manipulatable objects. The challenge is in building systems that scale beyond specific situational instances and gracefully operate in novel conditions. In the past, researchers used heuristic and simple rule-based strategies to accomplish tasks such as scene segmentation or reasoning about occlusion. These heuristic strategies work in constrained environments where a roboticist can make simplifying assumptions about everything from the geometries of the objects to be interacted with, level of clutter, camera position, lighting, and a myriad of other relevant variables. The work in this thesis will demonstrate how to build a system for robotic mobile manipulation that is robust to changes in these variables. This robustness will be enabled by recent simultaneous advances in the fields of big data, deep learning, and simulation. The ability of simulators to create realistic sensory data enables the generation of massive corpora of labeled training data for various grasping and navigation-based tasks. It is now possible to build systems that work in the real world trained using deep learning entirely on synthetic data. The ability to train and test on synthetic data allows for quick iterative development of new perception, planning and grasp execution algorithms that work in many environments.To build a robust system, this thesis introduces a novel multiple-view shape reconstruction architecture that leverages unregistered views of the object. To navigate to objects without localizing the agent, this thesis introduces a novel panoramic target goal architecture that takes previous views of the agent to inform a policy to navigate through an environment. Additionally, a novel next-best-view methodology is introduced to allow the agent to move around the object and refine its initial understanding of the object. The results show that this deep learned sim-to-real approach performs best when compared to heuristic-based methods in terms of reconstruction quality and success-weighted-by-path-length (SPL). This approach is also adaptable to the environment and robot chosen due to its modular design.I would like to acknowledge all members of the Columbia Robotics Lab (CRLab) for making my stay there an enjoyable and rewarding experience. In particular, I would like to thank