To my family, past, present and future, very specially to my parents and to my wife Manuela for their unconditional love and support. v 65 3.1.8 Prediction/Forecasting 65 3.1.9 Translation 65 3.1.10 Alignment 66 3.1.11 KWS, indexing, querying by example 69 3.1.12 Unsupervised term discovery 72 3.1.13 Filtering/Denoising and Smoothing 73 3.1.14 Summarization 73 3.1.15 Joint Segmentation and Classification 74 3.2 Some extensions 80 3.2.1 Several input sequences 80 3.2.2 Interactive systems 82 xv xvi Contents 3.3 Limitations, how to face them, new proposal 87 3.4 Questionnaire when faced with a new problem 91 3.5 Analysis and evaluation measures 93 3.5.1 Quality assessment 96 3.5.2 Performance 99 3.5.3 Interaction 100 3.5.4 Other measures 102 3.6 Summary and some conclusions 102 105 4.1 Some preliminary ML concepts 106 4.1.1 (Probabilistic) graphical models 115 4.2 Two stage generative model 130 4.2.1 Hierarchy for the second stage 133 4.2.2 Limitations, extensions and generalizations 137 4.3 Classical problems of two stage generative models 141 4.3.1 Probability of observation 142 4.3.2 Decoding 144 4.3.3 Model estimation 145 4.4 Some alternative models 146 4.4.1 Relationship with Dynamic Graphical Models 146 4.4.2 Fixed dimension feature segments 150 4.4.3 Estimation of frame-wise segment posteriors 155 4.4.4 Graph transformer networks 160 4.4.5 Some non-probabilistic frameworks 163 4.5 Summary and some conclusions 165 , , , 167 5.1 Introduction 167 5.2 Recognition, parsing, decoding 170 5.3 Weighted languages and semirings 172 5.4 Some formalisms 177 5.4.1 Formal/generative grammars 177 5.4.2 Finite state automata and transducers 184 5.4.3 Recurrent transition networks 187 5.5 Deriving the composition of a regular and a CF model 194 5.5.1 State-pair transducer composition 195 5.5.2 Extension to null-transitions 200 5.5.3 Extension to model reference transitions 209 5.5.4 Transformation to homogeneous epsilon form 220 5.6 Review of parsing approaches, decoders and algorithms 225 5.7 From composition to recognition/decoding 232 5.7.1 Acyclic inputs 233 5.7.2 Semiring specific optimizations 235 5.8 Summary and some conclusions 235 239 6.1 Probabilistic decompositions 242 6.1.1 Chain rule, clustering histories 243 6.1.2 Whole sentence LMs 244 6.1.3 Combining spans of the sequence 245 6