Summary
Declarative static program analysis has become one of the widely‐used program analysis techniques. Declarative static analyzers perform three steps: creating databases of facts from program source code, evaluating rules to generate new facts, and running queries over facts to extract all information related to specific properties via query systems. Declarative static analyzers can easily target diverse programming languages by modifying only databases and rules for new languages. Because query systems are independent of programming languages, they are reusable for new languages. However, even when declarative analyzers support multiple programming languages they do not currently support the analysis of multilingual programs written in two or more programming languages. We propose a systematic methodology that extends a declarative static analyzer supporting multiple languages to support multilingual programs as well. The main idea is to reuse existing components of the analyzer. Our approach first generates a merged database of facts, consisting of multiple logical language spaces. It allows existing language‐specific rules to derive new facts for the corresponding language from the facts in the corresponding language space. Then, it defines language‐interoperation rules that handle the language interoperation semantics. Finally, it uses the same query system to get analysis results leveraging the language interoperation semantics. We develop a proof‐of‐concept declarative static analyzer for multilingual programs by extending CodeQL, which can track dataflows across language boundaries. Our evaluation shows that the analyzer successfully tracks dataflows across Java‐C and Python‐C language boundaries and detects genuine interoperation bugs in real‐world multilingual
programs.