Contour tracing is an important pre-processing step in many image processing applications such as feature recognition, biomedical imaging, security and surveillance. As single processor architectures reach their performance limits, parallel processing architectures offer energy efficient and high performance solutions for real time applications. Parallel processing architectures, are thus used for several real time image processing applications. Among the several interconnection schemes available, Cayley graph based interconnections offer easy routing and symmetric implementation capabilities. For parallel processing systems with a Cayley graph based interconnection scheme, torus, we developed three accelerated algorithms corresponding to three existing families of contour tracing algorithms. We simulated these algorithms on a parallel processing framework to quantify the normalized speed-up possible in any torus connected parallel processing system. We also compared our best performing algorithm with the existing parallel processing implementations for Nvidia GPUs. We observed a speed-up of up to 468 times using our algorithms on a parallel processing architecture in comparison to the corresponding algorithm on a single processor architecture. We evaluated a speed up of 194 (and 47) compared to the existing parallel processing contour tracing implementation on Tesla K40c (and Quadro RTX 5000 GPU hardware respectively). We observe that for torus connected parallel processing architectures used for image processing, our algorithms can be used to speed up contour tracing, without any hardware modification.