Although they often provide critical services, most workflow systems are not dependable. There has been much research on dependable/survivable distributed systems; most is concerned with developing new architectures, not adapting pre-existing ones. Additionally, the research has focused on hardening, security-based defense, as opposed to recovery. For deployed systems, it is often infeasible to completely replace existing infrastructure; what is needed are ways to adapt existing distributed systems to offer better dependability. In this paper, we outline a general architecture that can easily be retrofitted to legacy workflow systems in order to improve dependability and fault tolerance. We do this by monitoring enactment and replicating partial workflow states as tools for detection, analysis and recovery. We discuss some policies that can guide these mechanisms. Finally, we describe and evaluate our implementation, Survivor, which modified an existing workflow system provided by the Naval Research Lab.