Today's smartphones are equipped with embedded sensors, such as accelerometers and gyroscopes, which have enabled a variety of measurements and recognition tasks. In this paper, we jointly investigate two types of recognition problems in a joint manner, e.g., human activity recognition and smartphone on-body position recognition, in order to enable more robust context-aware applications. So far, these two problems have been studied separately without considering the interactions between each other. In this study, by first applying a novel data preprocessing technique, we propose a joint recognition framework based on the multi-task learning strategy, which can reduce computational demand, better exploit complementary information between the two recognition tasks, and lead to higher recognition performance. We also extend the joint recognition framework so that additional information, such as user identification with biometric motion analysis, can be offered. We evaluate our work systematically and comprehensively on two datasets with real-world settings. Our joint recognition model achieves the promising performance of 0.9174 in terms of F1-score for user identification on the benchmark RealWorld Human Activity Recognition (HAR) dataset. On the other hand, in comparison with the conventional approach, the proposed joint model is shown to be able to improve human activity recognition and position recognition by 5.1% and 9.6% respectively.