Dual-arm manipulation of daily living objects is essential for robots operating in household environments. Learning from demonstration is a promising approach for teaching robots dual-arm manipulation skills. It usually requires a dataset of participants demonstrating the sequence of manipulation actions to achieve a goal. That sequence can be represented using symbolical or trajectory encoding. The symbolic-encoded sequence is known as the task plan. The chief limitations of current datasets are that most tend to disregard dual-arm manipulation skills and omit the formal grammar used to annotate the task plans. This paper introduces BiCap, a novel bi-modal dataset of dual-arm manipulation actions on daily living objects coupled with a bio-inspired action context-free grammar for fine-grained task plan annotation to train, test, and validate learning from demonstration-based algorithms for robotic dual-arm manipulation. 15 participants were recruited. The experimenter placed reflective markers on their upper limbs and pelvis. Then, the participants sat at a table where one or two objects were placed. They performed one of the following tasks: pouring, opening, and passing, using both hands. An RGB camera pointing towards the table recorded the participants’ hand movements. Subsequently, an annotator reviewed the RGB videos and wrote the participants’ task plans using the bio-inspired action context-free grammar. The participants’ upper-limb kinematics were computed, too, which provides the trajectory-encoded action sequences. The resulting dataset, BiCap, contains 4,026 task plans, videos, and motion data of the 15 participants.