A framework for building an infrastructure that semantically integrates, archives, and reuses data for various research purposes in human brain imaging remains critical. In particular, problems of aligning technical, clinical, and professional systems in order to facilitate data sharing are a recurring issue in brain imaging. However, large samples of well-characterized images with detailed metadata are increasingly needed. This paper outlines the experience of the NeuroGrid Stroke Exemplar and further work in the Brain Research Imaging Centre and Stroke Trials Unit in developing an infrastructure that facilitates the linkage, archiving, and reuse of imaging data from stroke patients for large-scale clinical and epidemiological studies. We examined data from 12 past stroke projects carried out over the past two decades in our center and two large trials with 329 centers. We assessed previously published schemas and those developed specifically for large multicentre ischemic and hemorrhagic stroke treatment trials. We then developed our own harmonized and integrated schema and database with a webbased interface system, Longitudinal Online Research and Imaging System (LORIS), aiming to be flexible and adaptable to future trials and observational studies. We then linked image and metadata from 3,079 patients acquired in stroke research in one center in a 14-year period (1996-2010) with prospective central hospital health statistics to obtain long-term follow-up. Our integrated database includes 3,079 subjects and over 550 federated and searchable data items including imaging details, medical history, and examination, stroke, and laboratory details, which map to large multicentre stroke trials with imaging data from over 10,000 patients from 30 countries. The central linkage identified 879 of 3,079 patients had died, 525 had recurrent strokes, and 291 developed dementia during up to a 19-year period (range = 0-19; median = 9.04; IQR = 12.17) of follow-up, demonstrating its utility. The core metadata schema has benefited from extensive development in large clinical trials. Further trials' data can now be added. It provides an opportunity to crosslink and reuse data for a range of large-scale stroke brain imaging clinical and research purposes including developing data analytics models for research into common brain diseases and their consequences.