Bit-stream recognition (BSR) has a wide range of applications, including forensic investigations, detecting copyright infringement, and analyzing malware. In order to analyze file fragments recovered by digital forensics, it is necessary to use a BSR method that can accurately classify classes while addressing various domains without preprocessing the raw input bitstream. For example, it is important to note that in the case of compiler provenance recovery, a type of BSR, the same bit sequence can have different meanings for different CPU architectures. As a result, traditional methods that rely heavily on disassembly tools, such as IDA Pro, may have limited in applicaballity scope to programs designed for specific CPU architecture. To address the aforementioned limitation, we proposed a novel learning method. Our method involves the upstream layers (sub-net) capturing global features and instructing the downstream layers (main-net) to shift focus, even when a portion of the input bit-stream has identical values. Through our experiments, we utilized a model that was less than 1/300 the size of the state-of-the-art model. Despite its smaller size, our method achieved the highest classification performance of 99.54 on a multi-CPU architecture, outperforming existing methods.