Unsupervised domain adaptation (UDA) aims to transfer and adapt knowledge from a labeled source domain to an unlabeled target domain. Traditionally, geometry-based alignment methods, e.g., Orthogonal Procrustes Alignment (OPA), formed an important class of solutions to this problem. Despite their mathematical tractability, they rarely produce effective adaptation performance with the recent benchmarks. Instead, state-of-the-art approaches rely on sophisticated distribution alignment strategies such as adversarial training. In this paper, we show that, conventional OPA, when coupled with powerful deep feature extractors and a novel bi-level optimization formulation, is indeed an effective choice for handling challenging distribution shifts. When compared to existing UDA methods, our approach offers the following benefits: computational efficiency: Through the isolation of alignment and classifier training steps during adaptation, and the use of deep OPA, our approach is computationally very effective (typically requiring only 700 K parameters more than the base feature extractor as compared to millions of extra parameters required by state-of-the-art UDA baselines); (ii) data efficiency: Our approach does not require updating our feature extractor during adaptation and hence can be effective even with limited target data; (iii) improved generalization: The resulting models are intrinsically well-regularized and demonstrate effective generalization even in the challenging partial DA setting, i.e., target domain contains only a subset of the classes observed in the source domain.; and (iv) incremental training: Our approach allows progressive adaptation of models to novel domains (unseen during training) without requiring retraining of the model from scratch.