The aim of this article is the definition of a reliability-aware methodology for the design of embedded systems on multi-FPGA platforms. The designed system must be able to detect the occurrence of faults globally and autonomously, in order to recover or to mitigate their effects. Two categories of faults are identified, based on their impact on the device elements; (i) recoverable faults, transient problems that can be fixed without causing a lasting effect namely and (ii) nonrecoverable faults, those that cause a permanent problem, making the portion of the fabric unusable. While some aspects can be taken from previous solutions available in literature, several open issues exist. In fact, no complete design methodology handling all the peculiar issues of the considered scenario has been proposed yet, a gap we aim at filling with our work. The final system exposes reliability properties and increases its overall lifetime and availability