Abstract-Can a high-performance document image recognition system be built without detailed knowledge of the application? Having benefited from the statistical machinelearning revolution of the last twenty years, our architectures rely less on hand-crafted special-case rules and more on models trained on labeled-sample data sets. But urgent questions remain. When we can't collect (and label) enough real training data, does it help to complement them with data synthesized using generative models? Is it ever completely safe to rely on synthetic data? If we can't manage to train (or craft) a single complete, near-perfect, application-specific "strong" model to drive recognition, can we make progress by combining several imperfect or incomplete "weak" models? Can recognition that is carried out jointly over weak models perform optimally while still running fast? Can a recognizer automatically pick a strong model of its input? Must we always pre-train models for every kind ("style") of input expected, or can a recognizer adapt to unknown styles? Can weak models adapt autonomously, growing stronger and so driving accuracy higher, without any human intervention? Can one model "criticize"-and then proceed to correct-other models, even while it is being criticized and corrected in turn by them? After twenty-five years of research on these questions we have partial answers, many in the affirmative: in addition to promising laboratory demonstrations, we can take pride in successful applications. I'll illustrate the evolution of the state of the art with concrete examples, and point out open problems.(Based on work by and with T. Pavlidis, T.