“…Audio-visual perception should adapt -on-the-fly and with limited or no prior knowledge -to changing conditions in order to guarantee the correct execution of the task and the safety of the person. For assistive scenarios at home, audio-visual perception should accurately and robustly estimate the physical properties (e.g., weight and shape) of household containers, such as cups, drinking glasses, mugs, bottles, and food boxes [1,4,6,7,8]. However, the material, texture, transparency and shape can vary considerably across containers and also change with their content, which may not be visible due to the opaqueness of the container or occlusions, and hence should be inferred through the behaviour of the human [1,7,8,9,10].…”