Wearable Fall Detection Systems (FDSs) have gained much research interest during last decade. In this regard, Machine Learning (ML) classifiers have shown great efficiency in discriminating falls and conventional movements or Activities of Daily Living (ADLs) based on the analysis of the signals captured by transportable inertial sensors. Due to the intrinsic difficulties of training and testing this type of detectors in realistic scenarios and with their target audience (older adults), FDSs are normally benchmarked against a predefined set of ADLs and emulated falls executed by volunteers in a controlled environment. In most studies, however, samples from the same experimental subjects are used to both train and evaluate the FDSs. In this work, we investigate the performance of ML-based FDS systems when the test subjects have physical characteristics (weight, height, body mass index, age, gender) different from those of the users considered for the test phase. The results seem to point out that certain divergences (weight, height) of the users of both subsets (training ad test) may hamper the effectiveness of the classifiers (a reduction of up 20% in sensitivity and of up to 5% in specificity is reported). However, it is shown that the typology of the activities included in these subgroups has much greater relevance for the discrimination capability of the classifiers (with specificity losses of up to 95% if the activity types for training and testing strongly diverge).