Sixiao Zheng scite author profile

Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoderdecoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (i.e., without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first (44.42% mIoU) position in the highly competitive ADE20K test server leaderboard.

show abstract

NMS-Loss: Learning with Non-Maximum Suppression for Crowded Pedestrian Detection

Luo

Fang

Zheng

et al. 2021

View full text Add to dashboard Cite

Non-Maximum Suppression (NMS) is essential for object detection and affects the evaluation results by incorporating False Positives (FP) and False Negatives (FN), especially in crowd occlusion scenes. In this paper, we raise the problem of weak connection between the training targets and the evaluation metrics caused by NMS and propose a novel NMS-Loss making the NMS procedure can be trained end-to-end without any additional network parameters. Our NMS-Loss punishes two cases when FP is not suppressed and FN is wrongly eliminated by NMS. Specifically, we propose a pull loss to pull predictions with the same target close to each other, and a push loss to push predictions with different targets away from each other. Experimental results show that with the help of NMS-Loss, our detector, namely NMS-Ped, achieves impressive results with Miss Rate of 5.92% on Caltech dataset and 10.08% on CityPersons dataset, which are both better than state-of-the-art competitors. CCS CONCEPTS• Computing methodologies → Object detection.

show abstract

Incrementally Zero-Shot Detection by an Extreme Value Analyzer

Zheng

Hou

2021

View full text Add to dashboard Cite

Clustering by the Probability Distributions From Extreme Value Theory

Zheng

Fan

Hou

et al. 2023

IEEE Trans. Artif. Intell.

View full text Add to dashboard Cite

Clustering is an essential task to unsupervised learning. It tries to automatically separate instances into "coherent" subsets. As one of the most well-known clustering algorithms, k-means assigns sample points at the boundary to a unique cluster, while it does not utilize the information of sample distribution or density. Comparably, it would potentially be more beneficial to consider the probability of each sample in a possible cluster. To this end, this paper generalizes k-means to model the distribution of clusters. Our novel clustering algorithm thus models the distributions of distances to centroids over a threshold by Generalized Pareto Distribution (GPD) in Extreme Value Theory (EVT). Notably, we propose the concept of centroid margin distance, use GPD to establish a probability model for each cluster, and perform a clustering algorithm based on the covering probability function derived from GPD. Such a GPD k-means thus enables the clustering algorithm from the probabilistic perspective. Correspondingly, we also introduce a naive baseline, dubbed as Generalized Extreme Value (GEV) k-means. GEV fits the distribution of the block maxima. In contrast, the GPD fits the distribution of distance to the centroid exceeding a sufficiently large threshold, leading to a more stable performance of GPD k-means. Notably, GEV k-means can also estimate cluster structure and thus perform reasonably well over classical k-means. Thus, extensive experiments on synthetic datasets and real datasets demonstrate that GPD kmeans outperforms competitors. The github codes are released in https://github.com/sixiaozheng/EVT-K-means.Impact Statement-Clustering is an essential task to unsupervised learning. The most well-known clustering algorithm is the k-means. The k-means algorithm assigns each sample to the nearest unique cluster. However, due to the lack of prior information on feature space, if a sample is at the boundary of several clusters, assigning the sample to any one of them may be inappropriate. Instead, This paper considers the probabilities of the sample in each possible cluster. It is the first time in the literature to generalize k-means by the probabilistic tools in Extreme Value Theory (EVT). Our novel clustering algorithm thus models the distributions of distances to centroids over a threshold by Generalized Pareto Distribution (GPD). The GPD k-means we proposed can be widely used in many fields, including customer group analysis, geographic information analysis, network text analysis, e-commerce purchase behavior analysis, and other fields. The GPD k-means will bring some positive effects, such as improving the performance of the clustering task, effectively clustering the streaming data, and improving the performance of the clustering analysis system.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sixiao Zheng

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

NMS-Loss: Learning with Non-Maximum Suppression for Crowded Pedestrian Detection

Incrementally Zero-Shot Detection by an Extreme Value Analyzer

Clustering by the Probability Distributions From Extreme Value Theory

Contact Info

Product

Resources

About