A set of communication operations is defined which allows a form of task parallelism to be achieved in a data parallel architecture. The set of processors can be subdivided recursively into groups, and a communication operation inside a group never conflicts with communications taking place in other groups. The groups may be subdivided and recombined at any time, allowing the task structure to adapt to the needs of the data. The algorithms implementing the grouping and communications are defined using parallel scans and folds which can be executed efficiently in an abstract tree machine. This approach is best suited for massively parallel systems with fine grain processors.