Abstract. We investigate a semi-automated identification of technical problems occurred by armed forces weapon systems during mission of war. The proposed methodology is based on a semantic analysis of textual information in reports from soldiers (war logs). Latent semantic indexing (LSI) with non-negative matrix factorization (NMF) as technique from multivariate analysis and linear algebra is used to extract hidden semantic textual patterns from the reports. NMF factorizes the term-by-war log matrix -that consists of weighted term frequencies -into two non-negative matrices. This enables natural parts-based representation of the report information and it leads to an easy evaluation by human experts because human brain also uses parts-based representation. For an improved research and technology planning, the identified technical problems are a valuable source of information. A case study extracts technical problems from military logs of the Afghanistan war. Results are compared to a manual analysis written by journalists of 'Der Spiegel'.
Keywords. Non-negative matrix factorization, NMF, Text Mining
IntroductionWar logs written by soldiers during mission of war are a valuable source of information. They indicate e.g. technical problems occurred by armed forces weapon systems in use. Considering some of these problems in current research and technology (R&T) projects may be necessary for an increase reliability of future weapon systems. Thus, extracting this feedback from war logs is an important task in R&T planning. We provide a methodology for a semi-automated identification of technical problems in soldiers' war logs. A manual identification of these problems e.g. by human experts is not possible because of the large amount of the logs. Although war logs describe the events of the war, technical problems are just a part of the content e.g. an event is described in detail and besides the malfunction of a weapon system during that event is also mentioned. A frequently occurred malfunction of a specific system in different events can be discovered by identifying the underlying (hidden) semantic textual patterns from the collection of war logs because different soldiers formulize malfunctions by using different words. This excludes the use of text classification