Abstract. Despite tremendous advances in computer software and hardware, certain key aspects of experimental research in document analysis, and pattern recognition in general, have not changed much over the past 50 years. This paper describes a vision of the future where communitycreated and managed resources make possible fundamental changes in the way science is conducted in such fields. We also discuss current developments that are helping to lead us in this direction.
Introduction: Setting the StageThe field of document analysis research has had a long, rich history. Still, despite decades of advancement in computer software and hardware, not much has changed in how we conduct our experimental science, as emphasized in George Nagy's superb keynote retrospective at the DAS 2010 workshop [11].In this paper, we present a vision for the future of experimental document analysis research. Here the availability of "cloud" resources consisting of data, algorithms, interpretations and full provenance, provides the foundation for a research paradigm that builds on collective intelligence (both machine and human) to instill new practices in a range of research areas. The reader should be aware that this paradigm is applicable to a much broader scope of machine perception and pattern recognition -we use document analysis as the topic area to illustrate the discussion as this is where our main research interests lie, and where we can legitimately back our claims. Currently under development, the platform we are building exploits important trends we see arising in a number of key areas, including the World Wide Web, database systems, and social and collaborative media.The first part of this paper presents our view of this future as a fictional, yet realizable, "story" outlining what we believe to be a compelling view of community-created and managed resources that will fundamentally change the way we do research. In the second part of the paper, we then turn to a more technical discussion of the status of our current platform and developments in this direction.⋆ This work is a collaborative effort hosted by the Computer Science and Engineering Department at Lehigh University and funded by a Congressional appropriation administered through DARPA IPTO via Raytheon BBN Technologies.