High performance computing is facing an exponential growth in job output dataset sizes. This implies a significant commitment of supercomputing center resources-most notably, precious scratch space-in handling data staging and offloading. However, the scratch area is typically managed using simple "purge policies", without sophisticated "end-user data services" that are required to balance center's resource consumption and user serviceability. End-user data services such as offloading are performed using point-to-point transfers that are unable to reconcile center's purge and users delivery deadlines, unable to adapt to changing dynamics in the end-to-end data path and are not fault-tolerant.We propose a robust framework for the timely, decentralized offload of result data, addressing the aforementioned significant gaps in extant direct-transfer-based offloading. The decentralized offload is achieved using an overlay of user-specified intermediate nodes and well known landmark nodes. These nodes serve as a means both to provide multiple data-flow paths, thereby maximizing bandwidth as well as provide fail-over capabilities for the offload. We have implemented our techniques within a production job scheduler (PBS) and data transfer tool (BitTorrent), and our evaluation shows that the offloading times can be significantly reduced (90.2% for a 2.1 GB file), while also meeting centeruser Service Level Agreements.