High fidelity state preparation represents a fundamental challenge in the application of quantum technology. While the majority of optimal control approaches use feedback to improve the controller, the controller itself often does not incorporate explicit state dependence. Here, we present a general framework for training deep feedback networks for open quantum systems with quantum nondemolition measurement that allows a variety of system and control structures that are prohibitive by many other techniques and can in effect react to unmodeled effects through nonlinear filtering. We demonstrate that this method is efficient due to inherent parallelizability, robust to open system interactions, and outperforms landmark state feedback control results in simulation.