Commercially available consumer electronics (smartwatches and wearable biosensors) are increasingly enabling acquisition of peripheral physiological and physical activity data inside and outside of laboratory settings. However, there is scant literature available for selecting and assessing the suitability of these novel devices for scientific use. To overcome this limitation, the current paper offers a framework to aid researchers in choosing and evaluating wearable technologies for use in empirical research. Our seven-step framework includes: (1) identifying signals of interest; (2) characterizing intended use cases; (3) identifying study-specific pragmatic needs; (4) selecting devices for evaluation; (5) establishing an assessment procedure; (6) performing qualitative and quantitative analyses on resulting data; and, if desired, (7) conducting power analyses to determine sample size needed to more rigorously compare performance across devices. We illustrate the application of the framework by comparing electrodermal, cardiovascular, and accelerometry data from a variety of commercial wireless sensors (Affectiva Q, Empatica E3, Empatica E4, Actiwave Cardio, Shimmer) relative to a well-validated, wired Mindware laboratory system. Our evaluations are performed in two studies (N=10, N=11) involving psychometrically sound, standardized tasks that include physical activity and affect induction. After applying our framework to this data, we conclude that only some commercially available consumer devices for physiological measurement are capable of wirelessly measuring peripheral physiological and physical activity data of sufficient quality for scientific use cases. Thus, the framework appears to be beneficial at suggesting steps for conducting more systematic, transparent, and rigorous evaluations of mobile physiological devices prior to deployment in studies.