Principal Component Analysis (PCA) is a wellestablished approach commonly used for dimensionality reduction. However, its computational cost and memory requirements hamper the adoption of PCA in heavily resource-constrained embedded platforms. Streaming approaches have been proposed that may enable embedded implementations of the PCA. Among them, the History PCA (HPCA) algorithm stands out for its robustness to the variability in parameters and accuracy. This paper presents a parallel and memory-efficient implementation of HPCA in a structural health monitoring (SHM) application based on a heterogeneous network with sensor nodes measuring three-axial accelerations and gateways collecting measurements from several nodes and sending them to the cloud storage and analytic facility. In the targeted application, standard PCA reaches 15× compression factor with an average reconstruction signal to noise ratio of 16 dB and a negligible impact on the accuracy in the tracking of structural modal frequencies. By embedding HPCA on our SHM network gateways, we achieve the same compression factor as standard PCA, with more than 1000× reduction in data memory footprint for running the algorithm. Furthermore, we parallelize HPCA on the gateway, and we achieve a speedup of 7.1× (on 8 cores). Finally, we explore a fixed-point HPCA implementation on sensors (network end-nodes), that maximally distributes compression workload, minimizes required communication bandwidth, and maintains the same quality of reconstruction as HPCA in floating-point, with a compression factor of 10×.