We propose two light-weight and specialized Spatio-Temporal Graph Convolutional Networks (ST-GCNs): one for actions characterized by the motion of the human body and a novel one we especially design to recognize particular objects configurations during human actions execution. We propose a late-fusion strategy of the predictions of both graphs networks to get the most out of the two and to clear out ambiguities in the action classification. This modular approach enables us to reduce memory cost and training times. Moreover we also propose the same late fusion mechanism to further improve the performance using a Bayesian approach.We show results on 2 public datasets: CAD-120 and Watch-n-Patch. Our late-fusion mechanism yields performance gains in accuracy of respectively +21 percentage points (pp), +7 pp on Watch-n-Patch and CAD-120 compared to the individual graphs. Our approach outperforms most of the significant existing approaches.