Recently, the development of Low Earth Orbit (LEO) satellites and the advancement of the Mobile Edge Computing (MEC) paradigm have driven the emergence of the Satellite Mobile Edge Computing (Sat-MEC). Sat-MEC has been developed to support communication and task computation for Internet of Things (IoT) Mobile Devices (IMDs) in the absence of terrestrial networks. However, due to the heterogeneity of tasks and Sat-MEC servers, it is still a great challenge to efficiently schedule tasks in Sat-MEC servers. Here, we propose a scheduling algorithm based on the Deep Reinforcement Learning (DRL) method in the Sat-MEC architecture to minimize the average task processing time. We consider multiple factors, including the cooperation between LEO satellites, the concurrency and heterogeneity of tasks, the dynamics of LEO satellites, the heterogeneity of the computational capacity of Sat-MEC servers, and the heterogeneity of the initial queue for task computation. Further, we use the self-attention mechanism to act as a Q-network to extract high-dimensional dynamic information of tasks and Sat-MEC servers. In this work, we model the Sat-MEC environment simulation at the application level and propose a DRL-based task scheduling algorithm. The simulation results confirm the effectiveness of our proposed scheduling algorithm, which reduces the average task processing time by 22.1$$\%$$
%
, 30.6$$\%$$
%
, and 41.3$$\%$$
%
, compared to the genetic algorithm(GA), the greedy algorithm, and the random algorithm, respectively.