In this paper, we conduct a large-scale study on the crackability, correlation, and security of ∼ 145 million real world passwords, which were leaked from several popular Internet services and applications. To the best of our knowledge, this is the largest empirical study that has been conducted. Specifically, we first evaluate the crackability of ∼ 145 million real world passwords against 6+ state-of-the-art password cracking algorithms in multiple scenarios. Second, we examine the effectiveness and soundness of popular commercial password strength meters (e.g., Google, QQ) and the security impacts of username/email leakage on passwords. Finally, we discuss the implications of our results, analysis, and findings, which are expected to help both password users and system administrators to gain a deeper understanding of the vulnerability of real passwords against state-of-the-art password cracking algorithms, as well as to shed light on future password security research topics. et al.: ZERO-SUM PASSWORD CRACKING GAME: A LARGE-SCALE EMPIRICAL STUDY 2 evident that some password meters are not currently guiding users to choose secure passwords. Sometimes, they may even mislead users. On the other hand, proper password meters are useful in helping users choose secure passwords against modern password cracking algorithms.(iii) We evaluate the security impacts of username/email leakage. According to our results, both usernames and email leakage have surprising impact on password security, which alerts users and system administrators to the fact that besides passwords themselves, usernames, email addresses, and other user profiles also deserve dedicated protection.(iv) We evaluate the correlation among passwords. We find that user-chosen passwords do exhibit regional/language (or, cultural) differences. This finding has implications on how to select proper training data and how to measure the mutual information between two password systems.
RELATED WORKEmpirical Studies on Password Security. In [4][5][7], the authors evaluated password use/re-use habits and password policies of a number of websites. In [6], Zhang et al. studied the security of password expiration using 7000+ accounts. In [8], Bonneau conducted a study to estimate the password guessing difficulty. In [9], Mazurek et al. implemented another study to measure the password guessability. In [2], Ma et al. investigated probabilistic password models. Li et al. studied the differences between passwords from Chinese and English users in [1]. The most related work to this paper is [3], where Dell'Amico et al. conducted an empirical analysis on password strength of 58,800 users. However, only the dictionary attack, PCFG [11], and a Markov model based scheme [10] were evaluated in [3]. Password Cracking. In [10], Narayanan and Shmatikov designed a fast dictionary attack on passwords based on a Markov model, which can generate candidate password guesses with probability above some threshold value. Dürmuth et al. improved Narayanan and Shmatikov's Markov model based pa...