Search-based Test Case Generation (TCG) for web applications suffers from unstable performance and suboptimal test suite problems due to diversity loss. However, previous diversity metrics mainly only focus on client-side models or server-side code, which are prone to low robustness and poor generalization in practical applications. We propose a diversity-driven TCG method DTester, which can maximize behavior exploration and minimize the test suite size while covering more server-side vulnerable paths. Three diversity metrics (i.e. phenotypic coupling, intent coupling and competitiveness) are proposed to measure the underlying relationship between test cases from user behavior, code logic and test execution history. Moreover, a 3-dimensional weight graph is designed to model association among metrics, which provides fine-grained guidance for the genetic algorithm to generate diverse test cases from the client-side behavior model. Our empirical evaluation on five web applications shows that DTester can efficiently and robustly generate better test suites than the state-of-the-art TCG method. The maximum improvement is [Formula: see text], [Formula: see text], [Formula: see text] and [Formula: see text] in efficiency, test suite size, diversity and robustness.