This paper proposes a novel laddering vision foundation model for change detection (CD) of remote sensing images. Current approaches have limitations in simultaneously extracting universal features and task-specific characteristics, and they cannot effectively integrate these features for detection tasks. The proposed model exploits both general features and task-specific characteristics for CD of remote sensing images. Specifically, task-agnostic characteristics are extracted from a pre-trained visual foundation model, which contains knowledge information of images. Then, the hierarchical transformer-based CD backbone is exploited to learn both long-range and local spatial information from remote sensing images. Furthermore, task-specific and universal features are integrated within the hierarchical network architecture, which can integrate heterogeneous feature maps and embedding tokens, addressing the differences in structure and content of different types of features. Three benchmark datasets are employed for comparative experiments, and experimental results certify the effectiveness and progressiveness in terms of CD of the investigated approach.