To successfully transfer design patterns to wafer, it is essential to calibrate different types of models to describe the optical, physical, and chemical effects in chip manufacturing process. In recent years, there have also been active investigations of machine learning (ML) models to capture various aspects of semiconductor processes. As it is well known, model training time and model accuracy are heavily influenced by the input data. It is becoming increasingly important to provide highly efficient methods to automatically generate effective pattern samples from full chip designs. A straightforward approach, simple random sampling, can be highly efficient to generate effective samples for a homogeneous population. However, real world chip layouts are characterized by geometrical and lithographical feature distributions that vary significantly across the full chip design space. The complexity of the problem necessitates the adoption of a comprehensive set of approaches for sampling as well as flexibility in customizing the sampling strategy for various applications. In this paper, we investigate automatic layout sampling to optimize the coverage and diversity of patterns given the need for minimizing training sample size or other constraints, and therefore adopting various unsupervised learning techniques. The flow scales very well with computation resources to efficiently process full chip layouts. A simple, standard interface is provided for typical usage, but flexible programming APIs are available to customize the sampling strategy for advanced applications. Results demonstrate that the samples generated by this flow have increased diversity, which leads to significantly reduced model training time with comparable or increased model accuracy.