Legal artificial intelligence (AI) has been a focal point due to its precision and efficiency, particularly in tasks like similar case retrieval, where finding relevant cases for a given query is crucial.Unlike conventional text retrieval, this task brings unique challenges, demanding high-quality annotated datasets for effective model training. Dealing with longer queries and candidate documents, alongside diverse definitions of similarity, adds complexity. This study introduces an innovative training approach, combining dense and sparse retrieval methods. Utilizing a sparse retrieval model, we extract unlabeled data from extensive legal cases. Subsequently, a dense retrieval model screens this data, merging it with labeled data to create pseudo-labeled data, iteratively training until convergence. The results demonstrate exceptional performance in the Chinese law retrieval task dataset, showcasing a notable 3.66% precision enhancement and a substantial 3.62% improvement in mean average precision (MAP). However, the dataset's imbalance across different charges of cases poses a challenge, potentially affecting retrieval performance for longtailed legal cases. Nonetheless, these outcomes signify accelerated and more efficient retrieval of similar cases for legal professionals. Additionally, they provide high-quality references for non-legal individuals lacking expertise in the field.INDEX TERMS Legal information retrieval, Similar case retrieval, Iterative training, Self-Supervised learning.