Abstract-A variety of error resilience and scalable coding techniques have recently been proposed to facilitate the delivery of video over best-effort networks; a common drawback of these techniques is reduced compression efficiency. Also, MPEG-7 descriptors have recently been developed for the purpose of indexing. In this paper, we propose to employ MPEG-7 descriptors to improve the quality of the video delivered over best-effort networks. In particular, we propose a video transmission system that uses the motion activity descriptors to ensure robust video transmission. A novel motion activity extraction technique is proposed, which relies on a neural network approach. By considering several low-level visual features, our proposed extraction approach achieves high consistency with subjective evaluations of motion activities. In order to demonstrate the benefits of the proposed transmission system, we develop a selective packet dropping scheme that can be applied in case of network congestion. Simulations demonstrate that the reconstruction quality of the proposed congestion scheme can surpass conventional schemes by 1.2 dB. The network performance of the proposed transmission system when video sequences are coded into single layer or scalable layers is presented. We also present a transcoding scheme that achieves the optimal reconstructed quality by exploiting the motion activities of the underlying video sequence.