Grasp intention recognition plays a crucial role in controlling assistive robots to aid older people and individuals with limited mobility in restoring arm and hand function. Among the various modalities used for intention recognition, the eye-gaze movement has emerged as a promising approach due to its simplicity, intuitiveness, and effectiveness. Existing gaze-based approaches insufficiently integrate gaze data with environmental context and underuse temporal information, leading to inadequate intention recognition performance. The objective of this study is to eliminate the proposed deficiency and establish a gaze-based framework for object detection and its associated intention recognition. A novel gaze-based grasp intention recognition and sequential decision fusion framework (GIRSDF) is proposed. The GIRSDF comprises three main components: gaze attention map generation, the Gaze-YOLO grasp intention recognition model, and sequential decision fusion models (HMM, LSTM, and GRU). To evaluate the performance of GIRSDF, a dataset named Invisible containing data from healthy individuals and hemiplegic patients is established. GIRSDF is validated by trial-based and subject-based experiments on Invisible and outperforms the previous gaze-based grasp intention recognition methods. In terms of running efficiency, the proposed framework can run at a frequency of about 22 Hz, which ensures real-time grasp intention recognition. This study is expected to inspire additional gaze-related grasp intention recognition works.