As available data increases, so too does the demand to discover new datasets to solve new problems. Existing studies using a base dataset and a keyword search often yield coarse-grained results where significant information overlaps and non-relevant data occur. They also implicitly assume that a user can purchase all datasets found, which is rarely true in practice. Therefore, achieving dataset discovery results with less redundancy using more fine-grained information needs and a budget is desirable. To achieve this, we study the problem of finding a set of datasets that maximize distinctiveness based on a user's fine-grained information needs while keeping the total price of the datasets within a user-defined budget. Note that the user may also have a base dataset that they want to expand. Here, the user's fine-grained information needs are expressed as a query set and the distinctiveness score for a set of datasets, which is the number of distinct tuples produced by running the query set on the datasets which are do not overlap with the base dataset. First, we prove the NP-hardness of this problem. Then, we develop a greedy algorithm that achieves an approximation of (1 − −1 )/2. But this algorithm is neither efficient nor scalable as it frequently computes the exact distinctiveness during dataset selection, which requires every tuple for the query result overlap in multiple datasets to be tested. To address this limitation, we propose an efficient and effective machine-learning-based (ML-based) algorithm to estimate the distinctiveness for a set of datasets, without the need for testing every tuple. The proposed algorithm is the first to support cardinality estimation (CE) for a query set on multiple datasets, as previous studies only support CE for a single query on a single dataset, and cannot effectively identify query result overlaps in multiple datasets. Extensive experiments using five real-world data pools demonstrate that our greedy algorithm using ML-based distinctiveness estimation outperforms all other baselines in both effectiveness and efficiency.