In this paper we introduce and investigate the statistical mechanics of hierarchical neural networks: First, we approach these systems à la Mattis, by thinking at the Dyson model as a single-pattern hierarchical neural network and we discuss the stability of different retrievable states as predicted by the related self-consistencies obtained from a mean-field bound and from a bound that bypasses the mean-field limitation. The latter is worked out by properly reabsorbing fluctuations of the magnetization related to higher levels of the hierarchy into effective fields for the lower levels. Remarkably, mixing Amit's ansatz technique (to select candidate retrievable states) with the interpolation procedure (to solve for the free energy of these states) we prove that (due to gauge symmetry) the Dyson model accomplishes both serial and parallel processing. One step forward, we extend this scenario toward multiple stored patterns by implementing the Hebb prescription for learning within the couplings. This results in an Hopfield-like networks constrained on a hierarchical topology, for which, restricting to the low storage regime (where the number of patterns grows at most logarithmical with the amount of neurons), we prove the existence of the thermodynamic limit for the free energy and we give an explicit expression of its mean field bound and of the related improved bound. The resulting self-consistencies for the Mattis magnetizations (that act as order parameters) are studied and the stability of solutions is analyzed to get a picture of the overall retrieval capabilities of the system according to the mean field and to the non-mean-field scenarios. Our main finding is that embedding the Hebbian rule on a hierarchical topology allows the network to accomplish both serial and parallel processing. By tuning the level of fast noise affecting it, or triggering the decay of the interactions with the distance among neurons, the system may switch from sequential retrieval to multitasking features and vice versa. However, as these multitasking capabilities are basically due to the vanishing "dialogue" between spins at long distance, such an effective penury of links strongly penalizes the network's capacity, which results bounded by the low storage.