With the advances in information technology, a computer has been embedded to many device used in everyday life. On the other hand, the report of damage concerning web application attacks is increasing these days, so these devices are facing a growing threat from web application attacks. Since it is not easy to cope with the automatic detection of the diversifying web application attacks using the black-list matching method, a lot of studies using machine learning method have been done in recently. In this paper, we will investigate the distribution of symbols in SQL injection attacks, and showed that the distribution can be approximated by a zeta distribution.
I. INTRODUCTIONAccording to the report of U. S.Department of Homeland Security (DHS) [1], cyber attacks against society's infrastruc ture, such as water network and natural gas pipeline have been observed recently. Furthermore, a lot of incidents, such as illegal access and divulging of information are reported, so commercial web applications are also facing cyber attacks. These cyber attacks are called web application attacks.There are various styles to the web application attacks. The famous web application attacks are SQL injection attack and cross-site scripting, and we treat SQL injection attacks in this study. SQL injection is a vulnerability of web application driven database, and the attacks exploiting this flaws enable unauthorized access to the database of web application.To defend web application from web application attacks, there is a way of using web application firewall (WAF). WAF is constructed by a blacklist which accumulated attacks in the past. So, WAF cannot detect a new developed attacks. Therefore, machine learning method have been studied to detect web application attacks automatically [2] [3] [4] [5]. In our previous studies [6] [7] [8] [9], we have already proposed some detection method of web application attacks by using stochastic models. The purpose of these studies above is to detect web application attacks, so these methods do not figure out the principle of web application attacks. SQL injection attacks have a lot of signature symbols such as semicolon and single quote.In this paper, we showed that the distribution of symbols which appear in SQL injection attacks can be approximated by a zeta distribution. We collected SQL injection attacks sample from web and books [10] [11] [12] [l3]. This means that sym bols of the SQL injection attacks feature are comparatively limited. Therefore, if we can choose attack feature symbols