The problem of subsample selection among an enormous number of combinations arises when some covariates are available for all units, but the response can be measured only for a subset of them. When estimating a Bayesian prediction model, optimized selections can be more efficient than random sampling. The work is motivated by environmental management of aquatic systems. We consider data on 4360 Finnish lakes and aim to find an approximately optimal subsample of lakes in the sense of Bayesian D-optimality. We study Bayesian two-stage selection where the choice of lakes to be measured at the second stage depends on the measurements carried out at the first stage. The results indicate that the two-stage approach has a modest advantage compared to the single-stage approach.