As shown in [1,2], score normalization is of crucial importance for improving the Average Term-Weighted Value (ATWV) measure that is commonly used for evaluating keyword spotting systems. In this paper, we compare three different methods for score normalization within a keyword spotting system that employs phonetic search. We show that a new unsupervised linear fit method results in better-estimated posterior scores, that, when fed into the keyword-specific normalization of [1], result in ATWV gains of 3% on average. Furthermore, when these scores are used as features within a supervised machine learning framework, they result in additional gains of 3.8% on average over the five languages used in the first year of the IARPA-funded project Babel.