The architecture of a large software system is widely considered important for such reasons as: providing a common goal to the stakeholders in realising the envisaged system; helping to organise the various development teams; and capturing foundational design decisions early in the development. Studies have shown that defects originating in system architectures can consume twice as much correction effort as that for other defects. Clearly, then, scientific studies on architectural defects are important for their improved treatment and prevention. Previous research has This paper is an enhanced version of paper (Li et al. 2009).
668Empir Software Eng (2011) 16:667-702 focused on the extent of architectural defects in software systems. For this paper, we were motivated to ask the following two complementary questions in a case study: (i) How do multiple-component defects (MCDs)-which are of architectural importance-differ from other types of defects in terms of (a) complexity and (b) persistence across development phases and releases? and (ii) How do highly MCDconcentrated components (the so called, architectural hotspots) differ from other types of components in terms of their (a) interrelationships and (b) persistence across development phases and releases? Results indicate that MCDs are complex to fix and are persistent across phases and releases. In comparison to a non-MCD, a MCD requires over 20 times more changes to fix it and is 6 to 8 times more likely to cross a phase or a release. These findings have implications for defect detection and correction. Results also show that 20% of the subject system's components contain over 80% of the MCDs and that these components are 2-3 times more likely to persist across multiple system releases than other components in the system. Such MCDconcentrated components constitute architectural "hotspots" which management can focus upon for preventive maintenance and architectural quality improvement. The findings described are from an empirical study of a large legacy software system of size over 20 million lines of code and age over 17 years.