An effective human-robot interaction is essential for wide penetration of service robots into the market. Such robots need vision systems to recognize objects. It is, however, difficult to realize vision systems that can work in various conditions. More robust techniques of object recognition and image segmentation are essential. Thus, we have proposed to use the human user's assistance for object recognition through speech. The robot asks a question to which the user can easily answer and whose answer can efficiently reduce the number of candidate objects even if there are occluded objects and/or objects composed of multicolor parts in the scene. It considers the characteristics of features used for object recognition such as the easiness for humans to specify them by word, thus generating a user-friendly and efficient sequence of questions. Experimental results show that the robot can detect target objects by asking the questions generated by the method.