This paper presents a novel framework and its prototype tool for indexing and retrieving specific fragments of voice recordings obtained during discussions about physical objects such as text documents, pictures, or 3D models. When a specific part of an object is mentioned, it is tagged with an ink dot that is immediately registered in a database by capturing a microscopic image of the dot. Simultaneously, an index of the recording fragment is created and linked with the dot. After the recording, a dot can be scanned and identified by matching its microscopic image with the database to retrieve the linked recording fragment for playback. A handy tool was developed to facilitate these operations while the user concentrates on the ongoing discussion. Performance tests of the dot identification have shown genuine matches without error. In demonstrations of a realistic usage scenario, the tool successfully facilitated the creation of indexes with dots during a voice recording and correctly played back all the specific recording fragments linked to the dots.