Modern vehicles contain scores of Electrical Control Units (ECUs) that broadcast messages over a Controller Area Network (CAN). Vehicle manufacturers rely on security through obscurity by concealing their unique mapping of CAN messages to vehicle functions which differs for each make, model, year, and even trim. This poses a major obstacle for after-market modifications notably performance tuning and in-vehicle network security measures. We present ACTT: Automotive CAN Tokenization and Translation, a novel, vehicle-agnostic, algorithm that leverages available diagnostic information to parse CAN data into meaningful messages, simultaneously cutting binary messages into tokens, and learning the translation to map these contiguous bits to the value of the vehicle function communicated.
Controller area networks (CANs) are a broadcast protocol for real-time communication of critical vehicle subsystems. Original equipment manufacturers (OEMs) of passenger vehicles hold secret their mappings of CAN data to vehicle signals, and these definitions vary per make, model, and year. Without these mappings, the wealth of real-time vehicle information hidden in the CAN packets is uninterpretable-severely impeding vehicle-related research including CAN cybersecurity and privacy studies, after-market tuning, efficiency and performance monitoring, and fault diagnosis to name a few.Guided by the four-part CAN signal definition, we present CAN-D (CAN Decoder), a modular, four-step pipeline for identifying each signal's boundaries (start bit and length), endianness (byte ordering), signedness (bit-to-integer encoding), and by leveraging diagnostic standards, augmenting a subset of the extracted signals with meaningful, physical interpretation. En route to CAN-D, we provide a comprehensive review of the CAN signal reverse engineering research. All previous methods ignore endianness and signedness, rendering them simply incapable of decoding many standard CAN signal definitions. Incorporating endianness grows the search space from 128 to 4.72E21 signal tokenizations, and introduces a web of changing dependencies. In response, we formulate, formally analyze, and provide an efficient solution to an optimization problem, allowing identification of the optimal set of signal boundaries and byte orderings. In addition, we provide two novel, state-of-the-art signal boundary classifiers (both superior to previous approaches in precision and recall in three different test scenarios) and the first signedness classification algorithm, which exhibits > 97% F-score. Overall, CAN-D is the only solution with the potential to extract any CAN signal and is the state of the art. In evaluation on ten vehicles of different makes, CAN-D's average 1 error is 5 times better (81% less) than all preceding methods, and exhibits lower average error even when considering only signals that meet prior methods' assumptions. Finally, CAN-D is implemented in lightweight hardware allowing OBD-II plugin for real-time invehicle CAN decoding.
Modern vehicles are complex cyber-physical systems made of hundreds of electronic control units (ECUs) that communicate over controller area networks (CANs). This inherited complexity has expanded the CAN attack surface which is vulnerable to message injection attacks. These injections change the overall timing characteristics of messages on the bus, and thus, to detect these malicious messages, time-based intrusion detection systems (IDSs) have been proposed. However, time-based IDSs are usually trained and tested on low-fidelity datasets with unrealistic, labeled attacks. This makes difficult the task of evaluating, comparing, and validating IDSs. Here we detail and benchmark four time-based IDSs against the newly published ROAD dataset, the first open CAN IDS dataset with real (non-simulated) stealthy attacks with physically verified effects. We found that methods that perform hypothesis testing by explicitly estimating message timing distributions have lower performance than methods that seek anomalies in a distributionrelated statistic. In particular, these "distribution-agnostic" based methods outperform "distribution-based" methods by at least 55% in area under the precision-recall curve (AUC-PR). Our results expand the body of knowledge of CAN time-based IDSs by providing details of these methods and reporting their results when tested on datasets with real advanced attacks. Finally, we develop an after-market plug-in detector using lightweight hardware, which can be used to deploy the best performing IDS method on nearly any vehicle.This manuscript has been co-authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). * Deborah H. Blevins and Pablo Moriano, placed in alphabetical order, contributed equally to this work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.