The Internet has recently witnessed unprecedented growth of a class of connected assets called the Internet of Things (IoT). Due to relatively immature manufacturing processes and limited computing resources, IoTs have inadequate devicelevel security measures, exposing the Internet to various cyber risks. Therefore, network-level security has been considered a practical and scalable approach for securing IoTs, but this cannot be employed without discovering the connected devices and characterizing their behavior. Prior research leveraged predictable patterns in IoT network traffic to develop inference models. However, they fall short of expectations in addressing practical challenges, preventing them from being deployed in production settings. This thesis identifies four practical challenges and develops techniques to address them which can help secure businesses and protect user privacy against growing cyber threats.My first contribution balances prediction gains against computing costs of traffic features for IoT traffic classification and monitoring. I develop a method to find the best set of specialized models for multi-view classification that can reach an average accuracy of 99%, i.e., a similar accuracy compared to existing works but reducing the cost by a factor of 6. I develop a hierarchy of one-class models per asset class, each at certain granularity, to progressively monitor IoT traffic. My second contribution addresses the challenges of measurement costs and data quality. I develop an inference method that uses stochastic and deterministic modeling to predict IoT devices in home networks from opaque and coarse-grained IPFIX flow data. Evaluations show that false positive rates can be reduced by 75% compared to related work without significantly affecting true positives. My third contribution focuses on the challenge of concept drifts by analyzing over six million flow records collected from 12 real home networks. I develop several inference strategies and compare their performance under concept drift, particularly when labeled data is unavailable in the testing phase. Finally, my fourth contribution studies the resilience of machine learning models against adversarial attacks with a specific focus on decision tree-based models. I develop methods to quantify the vulnerability of a given decision tree-based model against data-driven adversarial attacks and refine vulnerable decision trees, making them robust against 92% of adversarial attacks. i
List of PublicationsDuring the course of my Ph.D. candidature, a number of publications have been made based on the work presented here and are listed below for reference.