Code-mixing is a prevalent phenomenon in modern day communication. Though several systems enjoy success in identifying a single language, identifying languages of words in code-mixed texts is a herculean task, more so in a social media context. This paper explores the English-Bengali code-mixing phenomenon and presents algorithms capable of identifying the language of every word to a reasonable accuracy in specific cases and the general case. We create and test a predictorcorrector model, develop a new code-mixed corpus from Facebook chat (made available for future research) and test and compare the efficiency of various machine learning algorithms (J48, IBk, Random Forest). The paper also seeks to remove the ambiguities in the token identification process.
INTRODUCTIONDefending large scale enterprise networks from adversary attacks is an uphill task faced by present day network administrators. Defense approaches against such attacks traditionally have been mostly host centric, where attention is given to identifying vulnerabilities of the individual hosts and taking measures to mitigate them. Vulnerability scanning tools, such as Nessus, OpenVAS, Nexpose, etc. provide per host vulnerability information and help in achieving these objectives. However, one major problem with this approach is that it emphasises more on host specific local information and does not consider them in the light of global security context of the network. Theoretically, an exhaustive vulnerability searching and patching may lead to a secure system. However, this may not be possible in practice due to the costs involved and operational constraints. Moreover, in many cases, attackers combine elementary attacks to launch multistage attacks against critical assets. These elementary attacks exploit vulnerabilities of individual hosts and may be either remote or local. Intrusion Detection Systems, either network or host based, can detect those elementary attacks but cannot report whether they are part of a larger attack chain or not.An attack graph is an important modelling tool used in the assessment of security of enterprise networks. Using attack graphs, network administrators can understand how an attacker can combine vulnerabilities in multiple hosts in a multi-stage attack to compromise critical resources in a network. Moreover the size of an attack graph has direct impact on the perceived risk. Intuitively, a larger attack graph can mean more number of vulnerabilities that can be exploited or more number of attack paths to a resource or more attack spread; all implying less security and hence more risk. An exhaustive attack graph of a network provides global view of its security posture, enabling quantitative assessment of the same. Such assessments, when performed periodically help a network system to evolve over time.Since its introduction in 1998, attack graph has attracted lots of attention from researchers and a considerable amount of research effort has been spent in the development of theory and practices around the idea of attack graph. In its earlier days, dedicated security teams (called Red teams) used to determine overall security of networks by hand-drawing gigantic attack graphs and then analysing them. Obviously, this approach was tedious, error prone and did not scale up as the network size grew. This gave rise to the need for automated methods of attack graph generation. Automated techniques also guarantee that the generated attack graph is exhaustive and succinct. An exhaustive attack graph contains all possible attack paths and a succinct attack graph contains only those initial network states from where the attacker can reach the goal. Initial research proposed custom algorithms, model checking, logic based approaches as attack graph generation methods. However, the sca...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.