Action-imagery practice (AIP) is often less effective than action-execution practice (AEP). We investigated whether this is due to a different time course of learning of different types of sequence representations in AIP and AEP. Participants learned to sequentially move with one finger to ten targets, which were visible the whole time. All six sessions started with a test. In the first four sessions, participants performed AIP, AEP, or control-practice (CP). Tests involved the practice sequence, a mirror sequence, and a different sequence, which were performed both with the practice hand and the other (transfer) hand. In AIP and AEP, movement times (MTs) in both hands were significantly shorter in the practice sequence than in the other sequences, indicating sequence-specific learning. In the transfer hand, this indicates effector-independent visual-spatial representations. The time course of the acquisition of effector-independent visual-spatial representations did not significantly differ between AEP and AIP. In AEP (but not in AIP), MTs in the practice sequence were significantly shorter in the practice hand than in the transfer hand, indicating effector-dependent representations. In conclusion, effector-dependent representations were not acquired after extensive AIP, which may be due to the lack of actual feedback. Therefore, AIP may replace AEP to acquire effector-independent visual-spatial representations, but not to acquire effector-dependent representations.