In recent years, as the Ether platform has grown by leaps and bounds. Numerous unscrupulous individuals have used illegal transaction to defraud large sums of money, causing billions of dollars of losses to investors worldwide. Facing the endless stream of the illegal transaction based on Ether smart contracts problems, such as illegal transaction, money laundering, financial fraud, phishing. Currently, illegal transaction are only detected by a single view of the smart contract’s contract code view feature and account transaction view feature, which is not only incomplete, but also not fully representative of the smart contract’s features. More importantly, the single view detection model cannot accurately capture the global structure and semantic features between the Tokens of the view features. In this case, it is particularly important that all view features are shared among themselves. In this paper, we investigate a Transformer-based model for contrasting illegal transaction detection networks under multiple views (TranMulti-View Net). The model in this paper is based on Transformer to learn a multi-view fusion representation, which aims to maximise the fusion of the interaction information of different view features under the same condition. In this model we first use the Transformer model to learn global structure and semantic features from a sequence of Tokens tokenised by a view, capturing the remote dependencies of Tokens in the view features, and then we share the contract code view features and the account transaction view features across all views to learn important semantic information between views from each other. In addition, we find that the approach of semi-supervised training of multi-view features using contrast learning outperforms the scheme of prediction based on direct fusion of different view features, resulting in stronger correlation between view features. As a result, the underlying semantic information can be captured more accurately, leading to more accurate predictions of illegal transaction. The experimental results show that our proposed TranMulti-View Net obtains good detection results with a Precision score of 98%.