In short-term memory networks, transient stimuli are represented by patterns of neural activity that persist long after stimulus offset. Here, we compare the performance of two prominent classes of memory networks, feedback-based attractor networks and feedforward networks, in conveying information about the amplitude of a briefly presented stimulus in the presence of gaussian noise. Using Fisher information as a metric of memory performance, we find that the optimal form of network architecture depends strongly on assumptions about the forms of nonlinearities in the network. For purely linear networks, we find that feedforward networks outperform attractor networks because noise is continually removed from feedforward networks when signals exit the network; as a result, feedforward networks can amplify signals they receive faster than noise accumulates over time. By contrast, attractor networks must operate in a signal-attenuating regime to avoid the buildup of noise. However, if the amplification of signals is limited by a finite dynamic range of neuronal responses or if noise is reset at the time of signal arrival, as suggested by recent experiments, we find that attractor networks can out-perform feedforward ones. Under a simple model in which neurons have a finite dynamic range, we find that the optimal attractor networks are forgetful if there is no mechanism for noise reduction with signal arrival but nonforgetful (perfect integrators) in the presence of a strong reset mechanism. Furthermore, we find that the maximal Fisher information for the feedforward and attractor networks exhibits power law decay as a function of time and scales linearly with the number of neurons. These results highlight prominent factors that lead to trade-offs in the memory performance of networks with different architectures and constraints, and suggest conditions under which attractor or feedforward networks may be best suited to storing information about previous stimuli.