We thank the discussants for their perspectives on our presentation (Hero et al., 2023) of emerging challenges in cybersecurity and the role of data science in addressing these challenges. We enjoyed reading the discussants' comments, as they amplify and reinforce our principal point: that statistical methods of data science will have an increasingly important impact on cybersecurity solutions. Each of the discussants' narratives stands on its own and we are in agreement with most of their points.The comments in the contribution of discussants Sanna Passino et al. (2023) are highly relevant, both from a foundations standpoint and from an applications perspective. We thank them for their remarks on challenges, data structures, and future directions in statistical cybersecurity. The proposal by Sanna Passino et al. (2023) to segment multimodal data into a few well-defined data structures and design fusion approaches for anomaly analysis over multiclass data is timely and could potentially enable a systematic approach to cybersecurity analysis of complex systems. We heartily agree that data structures, data fusion, and streaming methods will be important components to developing domain-specific data-driven cybersecurity solutions.Concerning data structures, Sanna Passino et al. ( 2023) give three common examples: graphs, point processes, and textual data. We note that data can either come directly in these forms or, more often, they are derived from more complex data types, such as images and video; for example, respectively as spatial correspondence graphs, event-marking point processes, or text captions. In the latter case, it may be beneficial to consider these data types as unstructured and in context, rather than solely considering the extracted data structures. Below we add to the discussant's comments on the importance of each of the three data structures to cybersecurity.
Textual DataAmong applications in textual data, the authors rightly point out challenges with latent Dirichlet allocation (LDA) models and interpretability. The issue of interpretability cannot be emphasized enough. Security analysts, the typical consumers of alerts generated by these methods, are very skeptical, due to the relatively high rate of false positives they are exposed to, and due to the costs of responding to a false positive.Therefore, they will simply ignore alerts that are not readily understandable. While one can bring additional context to help prove the viability of an alert, interpretable models that, directly from the parameters, can indicate why the evidence is malicious are a critical component to adoption by security analysts.Among applications of models for textual data, Sanna Passino et al. ( 2023) mention analysis of textual data for detection and parsing of computer log data. These are both excellent examples, but a new application has also emerged recently, with the use of large language models (LLMs) as an investigative and hunting tool for cyberthreats. Currently, analysts use a query interface, for example, SQ...