Transformers have significantly impacted the field of Computer Vision (CV) and the Internet of Things (IoT), surpassing Convolutional Neural Networks (CNN) in various tasks. However, ensuring the security of CV models for critical realworld IoT applications such as autonomous driving, surveillance, and biomedical technologies is crucial. The adversarial robustness of these models has become a key research area, especially for edge processing. This work evaluates the robustness of Swin tiny and ConvNeXt tiny, specifically focusing on real-world patch attacks in Object Detection scenarios. To ensure a fair comparison, we establish a level playing field between Transformerbased and CNN architectures, examining their vulnerabilities and potential defenses. Experimental results demonstrate the susceptibility of the Swin tiny and ConvNeXt tiny models to patch attacks, resulting in a significant decrease in average precision (AP) for the "Person" class. When trained adversarial patches were applied, the AP drops to 12.8% and 15.2% for Swin tiny and ConvNeXt tiny models, respectively, highlighting their vulnerability to these attacks. This paper contributes to securing CV models on IoT vision devices, providing insights into the robustness of transformer-based architectures against realworld attacks, and advancing the field of adversarial robustness in embedded computer vision.