Interactive robots that cooperate with humans must take appropriate actions in response to their requests. Unfortunately, such requests often have information gaps with their actual demands. However, robots are still expected to reason and act on what is required, depending on the situation. We call these reflective actions. To achieve such reflective actions for robots, we constructed a dataset that consists of the reflective actions of a domestic manipulation robot, in which the actions correspond to user utterances with their surroundings situations. By crowdsourcing, we defined several action scenarios that could be regarded as reflective. We recorded videos of situations described in the crowdsourcing scenarios, corresponding to the user situations just before the robot's reflective actions. We also annotated the videos of the user utterance transcriptions, objects, user poses, and user positions to investigate the contribution of such descriptive features to the reflective action decisions. Our experimental results indicated that even though our newly defined task is very challenging, it can be solved if the system has a concrete understanding of the situation.