Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Whole Slide Images (WSIs) are gigapixel, high-resolution digital scans of microscope slides, providing detailed tissue profiles for pathological analysis. Due to their gigapixel size and lack of detailed annotations, Multiple Instance Learning (MIL) becomes the primary technique for WSI analysis. However, current MIL methods for WSIs directly use embeddings extracted by a pretrained vision encoder, which are not task-specific and often exhibit high variability. To address this, we introduce a novel method, VQ-MIL, which maps the embeddings to a discrete space using weakly supervised vector quantization to refine the embeddings and reduce the variability. Additionally, the discrete embeddings from our methods provides clearer visualizations compared to other methods. Our experiments show that VQ-MIL achieves state-of-the-art classification results on two benchmark datasets. The source code is available athttps://github.com/aCoalBall/VQMIL.
Whole Slide Images (WSIs) are gigapixel, high-resolution digital scans of microscope slides, providing detailed tissue profiles for pathological analysis. Due to their gigapixel size and lack of detailed annotations, Multiple Instance Learning (MIL) becomes the primary technique for WSI analysis. However, current MIL methods for WSIs directly use embeddings extracted by a pretrained vision encoder, which are not task-specific and often exhibit high variability. To address this, we introduce a novel method, VQ-MIL, which maps the embeddings to a discrete space using weakly supervised vector quantization to refine the embeddings and reduce the variability. Additionally, the discrete embeddings from our methods provides clearer visualizations compared to other methods. Our experiments show that VQ-MIL achieves state-of-the-art classification results on two benchmark datasets. The source code is available athttps://github.com/aCoalBall/VQMIL.
State space models (SSM) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently shown significant potential in long-sequence modeling. Since the complexity of transformers’ self-attention mechanism is quadratic with image size, as well as increasing computational demands, researchers are currently exploring how to adapt Mamba for computer vision tasks. This paper is the first comprehensive survey that aims to provide an in-depth analysis of Mamba models within the domain of computer vision. It begins by exploring the foundational concepts contributing to Mamba’s success, including the SSM framework, selection mechanisms, and hardware-aware design. Then, we review these vision Mamba models by categorizing them into foundational models and those enhanced with techniques including convolution, recurrence, and attention to improve their sophistication. Furthermore, we investigate the widespread applications of Mamba in vision tasks, which include their use as a backbone in various levels of vision processing. This encompasses general visual tasks, medical visual tasks (e.g., 2D/3D segmentation, classification, image registration, etc.), and remote sensing visual tasks. In particular, we introduce general visual tasks from two levels: high/mid-level vision (e.g., object detection, segmentation, video classification, etc.) and low-level vision (e.g., image super-resolution, image restoration, visual generation, etc.). We hope this endeavor will spark additional interest within the community to address current challenges and further apply Mamba models in computer vision.
Change detection (CD) identifies surface changes by analyzing bi-temporal remote sensing (RS) images of the same region and is essential for effective urban planning, ensuring the optimal allocation of resources, and supporting disaster management efforts. However, deep-learning-based CD methods struggle with background noise and pseudo-changes due to local receptive field limitations or computing resource constraints, which limits long-range dependency capture and feature integration, normally resulting in fragmented detections and high false positive rates. To address these challenges, we propose a tree topology Mamba-guided network (TTMGNet) based on Mamba architecture, which combines the Mamba architecture for effectively capturing global features, a unique tree topology structure for retaining fine local details, and a hierarchical feature fusion mechanism that enhances multi-scale feature integration and robustness against noise. Specifically, the a Tree Topology Mamba Feature Extractor (TTMFE) leverages the similarity of pixels to generate minimum spanning tree (MST) topology sequences, guiding information aggregation and transmission. This approach utilizes a Tree Topology State Space Model (TTSSM) to embed spatial and positional information while preserving the global feature extraction capability, thereby retaining local features. Subsequently, the Hierarchical Incremental Aggregation Module is utilized to gradually align and merge features from deep to shallow layers to facilitate hierarchical feature integration. Through residual connections and cross-channel attention (CCA), HIAM enhances the interaction between neighboring feature maps, ensuring that critical features are retained and effectively utilized during the fusion process, thereby enabling more accurate detection results in CD. The proposed TTMGNet achieved F1 scores of 92.31% on LEVIR-CD, 90.94% on WHU-CD, and 77.25% on CL-CD, outperforming current mainstream methods in suppressing the impact of background noise and pseudo-change and more accurately identifying change regions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.