Intelligent Apps (iApps), equipped with in-App deep learning (DL) models, are emerging to offer stable DL inference services. However, App marketplaces have trouble auto testing iApps because the in-App model is black-box and couples with ordinary codes. In this work, we propose an automated tool, ASTM, which can enable largescale testing of in-App models. ASTM takes as input an iApps, and the outputs can replace the in-App model as the test object. ASTM proposes two reconstruction techniques to translate the in-App model to a backpropagation-enabled version and reconstruct the IO processing code for DL inference. With the ASTM's help, we perform a large-scale study on the robustness of 100 unique commercial in-App models and find that 56% of in-App models are vulnerable to robustness issues in our context. ASTM also detects physical attacks against three representative iApps that may cause economic losses and security issues.