Owing to the widespread advancement of transformer‐based artificial neural networks, artificial intelligence (AI) processors are now required to perform matrix–vector multiplication in addition to the conventional matrix–matrix multiplication. However, current AI processor architectures are optimized for general matrix–matrix multiplications (GEMMs), which causes significant throughput degradation when processing general matrix–vector multiplications (GEMVs). In this study, we proposed a port‐folding GEMV (PF‐GEMV) scheme employing multiformat and low‐precision techniques while reusing an outer product‐based processor optimized for conventional GEMM operations. This approach achieves 93.7% utilization in GEMV operations with an 8‐bit format on an 8
8 processor, thus resulting in a 7.5
increase in throughput compared with that of the original scheme. Furthermore, when applied to the matrix operation of the GPT‐2 large model, an increase in speed by 7
is achieved in single‐batch inferences.