Purpose
MITRE and the National Security Agency cooperatively developed and maintained a D3FEND knowledge graph (KG). It provides concepts as an entity from the cybersecurity countermeasure domain, such as dynamic, emulated and file analysis. Those entities are linked by applying relationships such as analyze, may_contains and encrypt. A fundamental challenge for collaborative designers is to encode knowledge and efficiently interrelate the cyber-domain facts generated daily. However, the designers manually update the graph contents with new or missing facts to enrich the knowledge. This paper aims to propose an automated approach to predict the missing facts using the link prediction task, leveraging embedding as representation learning.
Design/methodology/approach
D3FEND is available in the resource description framework (RDF) format. In the preprocessing step, the facts in RDF format converted to subject–predicate–object triplet format contain 5,967 entities and 98 relationship types. Progressive distance-based, bilinear and convolutional embedding models are applied to learn the embeddings of entities and relations. This study presents a link prediction task to infer missing facts using learned embeddings.
Findings
Experimental results show that the translational model performs well on high-rank results, whereas the bilinear model is superior in capturing the latent semantics of complex relationship types. However, the convolutional model outperforms 44% of the true facts and achieves a 3% improvement in results compared to other models.
Research limitations/implications
Despite the success of embedding models to enrich D3FEND using link prediction under the supervised learning setup, it has some limitations, such as not capturing diversity and hierarchies of relations. The average node degree of D3FEND KG is 16.85, with 12% of entities having a node degree less than 2, especially there are many entities or relations with few or no observed links. This results in sparsity and data imbalance, which affect the model performance even after increasing the embedding vector size. Moreover, KG embedding models consider existing entities and relations and may not incorporate external or contextual information such as textual descriptions, temporal dynamics or domain knowledge, which can enhance the link prediction performance.
Practical implications
Link prediction in the D3FEND KG can benefit cybersecurity countermeasure strategies in several ways, such as it can help to identify gaps or weaknesses in the existing defensive methods and suggest possible ways to improve or augment them; it can help to compare and contrast different defensive methods and understand their trade-offs and synergies; it can help to discover novel or emerging defensive methods by inferring new relations from existing data or external sources; and it can help to generate recommendations or guidance for selecting or deploying appropriate defensive methods based on the characteristics and objectives of the system or network.
Originality/value
The representation learning approach helps to reduce incompleteness using a link prediction that infers possible missing facts by using the existing entities and relations of D3FEND.