In the digital age, information is one of the main assets of an organization, becoming a competitive advantage. To protect the information, information security practices available to find vulnerabilities where the information is stored. A practice used to find vulnerability on web pages is Google Hacking. Google Hacking is an information security practice that uses back-and-forth search strings with or without the addition of advanced google operators. It is available on the Internet or on the Google Hacking Database, a database of the Offensive Security organization that contains validated dorks and tests. Despite a large number of holes available in the base, a base has few attributes, making it necessary for those who use it, have prior knowledge. One way to enrich this dorks base is to use natural language processing techniques, a subarea of artificial intelligence responsible for understanding, producing and interpreting content in human language. Given this scenario, the objective of this work is to enrich the database with natural language processing without support in information security tests. As a methodology, use experimental research with a quantitative approach. The results show that natural language processing can be used to enrich a dorks base.
Na execução de um processo de gerenciamento de liberação de versão de software existe a necessidade por parte dos especialistas humanos executores do processo de classificar a criticidade de cada versão de software. No entanto, a subjetividade desta classificação pode estar presente em função da experiência adquirida pelos especialistas ao longo dos anos. Com o objetivo de reduzir a subjetividade no processo, pode-se aplicar uma técnica de Inteligência Artificial denominada de Sistema Especialista (SE) para representar o conhecimento de especialistas humanos e utilizá-lo na resolução de problemas. Assim, o objetivo deste trabalho foi reduzir a subjetividade na classificação de criticidade de versão de software com o apoio de Sistema Especialista. Para tanto, elaborou-se um questionário com o objetivo de obter os pareceres de criticidade classificados em Alta, Média e Baixa de versão de software de cada especialista para auxiliar na elaboração das regras de produção do SE. O SE gerou 17 regras de produção com grau de confiança de 100% aplicadas em uma base de dados de produção. Os resultados da classificação realizada pelo SE corresponderam à classificação realizada pelos especialistas na base de produção, ou seja, o SE conseguiu representar o conhecimento deles. Em seguida, aplicou-se outro questionário aos especialistas a fim de verificar a percepção sobre a satisfação em relação ao uso do SE com resultado obtido de 4,8, considerado satisfatório . Concluiu-se, então, que o SE apoiou na redução da subjetividade na classificação da criticidade de versão de software.
In the initial phase of the pentest, named Open Source Intelligence, we use passive recognition with Google Hacking. Google Hacking is a practice that uses strings called Dorks. To support them, the Google Hacking Database is available with thousands of Dorks. However, the Google Hacking Database contains a reduced number of attributes, all with textual values, which makes it impossible to apply Machine Learning techniques. one way to enrich the Google Hacking Database with attributes is with Natural Language Processing and the transformation of textual values to numeric, converting Dorks characters to ASCII. So, the objective was to apply Natural Language Processing to enrich Google Hacking Database with attributes and convert its textual values to ASCII, to enable the application of Machine Learning techniques. The computational experiments were conducted in seven steps: Selection of the GHDB Base, Removal of Hyperlinks and Deletion of Attributes, Removal of the Site Parameter from Dorks, Removal of Outliers and Stopwords, Enrichment with Natural Language Processing, Base Transformation, and Application of the SOM. The results obtained with the application of the SOM were considered good, depending on the values presented by the metrics that evaluated the network. Thus, it is considered that the objective of this paper was achieved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.