Wireless capsule endoscopy (WCE) is becoming more popular in clinical settings as a safe and painless gastrointestinal examination. Existing studies on automatic detection of lesions in WCE images have the problems of small dataset size and uneven distribution of numbers in terms of categories, which often leads to overfitting of the model and severely limits the performance improvement of the object detection network on WCE images. The traditional data enhancement methods such as flipping and local erasure have limitations and cannot achieve good generalization results. Therefore, a WCE-DCGAN network was proposed in this paper to generate WCE images from existing WCE images. Using the images generated by this network and the original images as the input of the object detection network, there are different degrees of performance improvement on SSD, YOLOv5, and YOLOv4, and the average recognition accuracy of 97.25% can be achieved on SSD. Meanwhile, images generated by WCE-DCGAN not only enlarge the size of the data set, but also have the characteristics of diversity, which makes the model have a good generalization effect.
Input Image Landmarks LocalisationModel Fitting Estimated 3D ShapeFigure 1: The framework for 3D shape estimation. Top: A series of prior 3D shape basis [2]. Bottom: The shape estimation procedure for a given input image.Estimation of the 3D shape of a object from monocular image is an under-determined problem, which becomes harder when the observations are severely contaminated. In this paper, we propose a robust model to estimate 3D shape X from 2D landmarks x ∈ R 2×p with unknown camera pose M. The 3D shape of the object is assumed as a linear combination of predefined shape basisTo estimate s and M, we fit the model by minimizing the error between the observations x and the projected model points MX (as shown in Figure 1).Model. To address the outliers in the observed 2D points, which result from the complex background and illumination conditions, we propose a robust 3D shape estimation model. We explicitly model the outliers with an additional sparse error term E ∈ R 2×p . Thus, the robust model is then formulated aswhere t = [t x ,t y ] T · 1 1×p is the translation, and λ , η are the regularization parameters, and µ is the mean shape. The objective function in (1) is non-convex and non-smooth constrained on Stiefel manifold, where the coupling of the unknown shape representation coefficients s and camera pose M makes it more difficult to be solved. Method. We propose an efficient numerical algorithm based on Alternative Direction Method of Multipliers (ADMM) [1] to solve this problem. With an auxiliary variable V ∈ R 2×3 introduced, the augmented Lagrangian is,where Λ is the multiplier and τ is penalty parameter. We update each block with all the others fixed. Based on some analysis on non-convex optimization of ADMM [3], we set the orthogonality constraints into the smooth sub-problem (V -minimization),The closed-form solution is given by V k+1 = UI 2×3 W T , where U andThe other sub-problems can be easily solved. Both the optimization of M and t admit closed-form solutions. The updating of s is a Lasso-problem, and the sparse error pattern E can be efficiently solved by element-wise soft-thresholding. The convergences of ADMM with more than two blocks cannot be always guaranteed [1], and may be influenced by the update ordering. We set a fixed update ordering that can always lead convergence in our experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.