Abstract. Modern epidemiology integrates knowledge from heterogeneouscollections of data consisting of numerical, descriptive and imaging. Largescale epidemiological studies use sophisticated statistical analysis, mathematical models using differential equations and versatile analytic tools that handle numerical data. In contrast, knowledge extraction from images and descriptive information in the form of text and diagrams remain a challenge for most fields, in particular, for diseases of the eye. In this article we provide a roadmap towards extraction of knowledge from text and images with focus on forthcoming applications to epidemiological investigation of retinal diseases, especially from existing massive heterogeneous collections of data distributed around the globe.