Advanced Persistent Threats (APT) are a type of sophisticated multistage cyber attack, and the defense against APT is challenging. Existing studies apply signature-based or behavior-based methods to analyze monitoring data to detect APT, but little research has been dedicated to the important problem of addressing APT detection with limited resources. In order to maintain the primary functionality of a system, the resources allocated for security purposes, for example logging and examining the behavior of a system, are usually constrained. Therefore, when facing multiple simultaneous powerful cyber attacks like APT, the allocation of limited security resources becomes critical. The research in this paper focuses on the threat model where multiple simultaneous APT attacks exist in the defender's system, but the defender does not have sufficient monitoring resources to check every running process. To capture the footprint of multistage activities including APT attacks and benign activities, this work leverages the provenance graph which is constructed based on dependencies of processes. Furthermore, this work studies the monitoring strategy to efficiently detect APT attacks from incomplete information of paths on the provenance graph, by considering both the "exploitation" effect and the "exploration" effect. The contributions of this work are two-fold. First, it extends the classic UCB algorithm in the domain of the multi-armed bandit problem to solve cyber security problems, and proposes to use the malevolence value of a path, which is generated by a novel LSTM neural network as the exploitation term. Second, the consideration of "exploration" is innovative in the detection of APT attacks with limited monitoring resources. The experimental results show that the use of the LSTM neural network is beneficial to enforce the exploitation effect as it satisfies the same property as the exploitation term in the classic UCB algorithm and that by using the proposed monitoring strategy, multiple simultaneous APT attacks are detected more efficiently than using the random strategy and the greedy strategy, regarding the time needed to detect same number of APT attacks.