Many researchers are now dedicating their efforts to studying interactive modalities such as facial expressions, natural language, and gestures. This phenomenon makes communication between robots and individuals become more natural. However, many robots currently in use are appearance constrained and not able to perform facial expressions and gestures. In addition, although humanoid-oriented techniques are promising, they are time and cost consuming, which leads to many technical difficulties in most research studies. To increase interactive efficiency and decrease costs, we alternatively focus on three interaction modalities and their combinations, namely color, sound, and vibration. We conduct a structured study to evaluate the effects of the three modalities on a human's emotional perception towards our simple-shaped robot "Maru." Our findings offer insights into human-robot affective interactions, which can be particularly useful for appearance-constrained social robots. The contribution of this work is not so much the explicit parameter settings but rather deepening the understanding of how to express emotions through the simple modalities of color, sound, and vibration while providing a set of recommended expressions that HRI researchers and practitioners could readily employ.