Abstract. This paper introduces ThreadMill -a distributed and parallel component architecture for applications that process large volumes of streamed (timesequenced) data, such as is the case e.g. in speech and gesture recognition applications. Many stream-oriented applications offer ample opportunity for enhanced performance via concurrent execution, exploring a wide variety of parallel paradigms, such as task, data and pipeline parallelism. ThreadMill addresses the challenges of development and evolution of parallel and distributed applications in this domain by offering a modeling formalism, a programming framework and a runtime infrastructure. Component development and reuse, and application evolution are facilitated by the isolation of communication, concurrency, and synchronization concerns promoted by ThreadMill. A direct consequence of the novel mechanisms introduced by ThreadMill is that applications composed of reusable components can be re-targeted, unchanged, and made to run efficiently on a variety of execution environments. These environments can range e.g. from a single machine with a single processor, to a cluster of heterogeneous computational nodes, to certain classes of supercomputers. Experimental results show an eightfold speedup when using ten nodes of an AlphaServer DS20 cluster running a proof-of-concept 2D video-based tracker for hands and face of American Sign Language signers.