A custom convolutional neural network (CNN) integrated with convolutional long short-term memory (LSTM) achieves accurate 3D (2D + time) segmentation in cross-sectional videos of the Drosophila heart acquired by an optical coherence microscopy (OCM) system. While our previous FlyNet 1.0 model utilized regular CNNs to extract 2D spatial information from individual video frames, convolutional LSTM, FlyNet 2.0, utilizes both spatial and temporal information to improve segmentation performance further. To train and test FlyNet 2.0, we used 100 datasets including 500,000 fly heart OCM images. OCM videos in three developmental stages and two heartbeat situations were segmented achieving an intersection over union (IOU) accuracy of 92%. This increased segmentation accuracy allows morphological and dynamic cardiac parameters to be better quantified.