We consider the problem of Active Search, where a maximum of relevant objects -ideally all relevant objects -should be retrieved with the minimum effort or minimum time. Solving this kind of problem is crucial in applications such as fraud detection, e-discovery, prior art search in patent databases, etc. Typically, there are two main challenges to face when tackling this problem: first, the class of relevant objects has often a very low prevalence and, secondly, this class can be multi-faceted or multi-modal: objects could be relevant for completely different reasons. To solve this problem and its associated issues, we propose an approach based on a non-stationary (aka restless) extension of Thompson Sampling, a well-known strategy for the Multi-Armed bandit problems. The collection is first soft-clustered into a finite set of components and a posterior distribution of getting a relevant object inside each cluster (or component) is updated after receiving the user feedback about the proposed instances. The"next instance" selection strategy is a mixed, two-level decision process, where the algorithm first selects a cluster through "optimistic Thompson sampling" and then chooses ,inside the cluster, the instance with maximal relevance probability, as computed by an incremental on-line classifier. In some way, this method should be considered as an insurance, where the cost of the insurance is an extra exploration effort in the short run (i.e. the early stage of the search process), for achieving a nearly "total" recall with less efforts in the long run.