The increasing prevalence of smart mobile devices (e.g., smartphones) enables the combined use of mobile crowdsensing (MCS) and ecological momentary assessments (EMA) in the healthcare domain. By correlating qualitative longitudinal and ecologically valid EMA assessment data sets with sensor measurements in mobile apps, new valuable insights about patients (e.g., humans who suffer from chronic diseases) can be gained. However, there are numerous conceptual, architectural and technical, as well as legal challenges when implementing a respective software solution. Therefore, the work at hand (1) identifies these challenges, (2) derives respective recommendations, and (3) proposes a reference architecture for a MCS-EMA-platform addressing the defined recommendations. The required insights to propose the reference architecture were gained in several large-scale mHealth crowdsensing studies running for many years and different healthcare questions. To mention only two examples, we are running crowdsensing studies on questions for the tinnitus chronic disorder or psychological stress. We consider the proposed reference architecture and the identified challenges and recommendations as a contribution in two respects. First, they enable other researchers to align our practical studies with a baseline setting that can satisfy the variously revealed insights. Second, they are a proper basis to better compare data that was gathered using MCS and EMA. In addition, the combined use of MCS and EMA increasingly requires suitable architectures and associated digital solutions for the healthcare domain.