This paper discusses how distribution matching losses, such as those used in CycleGAN, when used to synthesize medical images can lead to mis-diagnosis of medical conditions. It seems appealing to use these new image synthesis methods for translating images from a source to a target domain because they can produce high quality images and some even do not require paired data. However, the basis of how these image translation models work is through matching the translation output to the distribution of the target domain. This can cause an issue when the data provided in the target domain has an over or under representation of some classes (e.g. healthy or sick). When the output of an algorithm is a transformed image there are uncertainties whether all known and unknown class labels have been preserved or changed. Therefore, we recommend that these translated images should not be used for direct interpretation (e.g. by doctors) because they may lead to misdiagnosis of patients based on hallucinated image features by an algorithm that matches a distribution. However there are many recent papers that seem as though this is the goal.
In this paper, we strive to answer two questions: What is the current state of 3D hand pose estimation from depth images? And, what are the next challenges that need to be tackled? Following the successful Hands In the Million Challenge (HIM2017), we investigate the top 10 state-ofthe-art methods on three tasks: single frame 3D pose estimation, 3D hand tracking, and hand pose estimation during object interaction. We analyze the performance of different CNN structures with regard to hand shape, joint visibility, view point and articulation distributions. Our findings include: (1) isolated 3D hand pose estimation achieves low mean errors (10 mm) in the view point range of [70, 120] degrees, but it is far from being solved for extreme view points; (2) 3D volumetric representations outperform 2D CNNs, better capturing the spatial structure of the depth data; (3) Discriminative methods still generalize poorly to unseen hand shapes; (4) While joint occlusions pose a challenge for most methods, explicit modeling of structure constraints can significantly narrow the gap between errors on visible and occluded joints.
We present two techniques to improve landmark localization in images from partially annotated datasets. Our primary goal is to leverage the common situation where precise landmark locations are only provided for a small data subset, but where class labels for classification or regression tasks related to the landmarks are more abundantly available. First, we propose the framework of sequential multitasking and explore it here through an architecture for landmark localization where training with class labels acts as an auxiliary signal to guide the landmark localization on unlabeled data. A key aspect of our approach is that errors can be backpropagated through a complete landmark localization model. Second, we propose and explore an unsupervised learning technique for landmark localization based on having a model predict equivariant landmarks with respect to transformations applied to the image. We show that these techniques, improve landmark prediction considerably and can learn effective detectors even when only a small fraction of the dataset has landmark labels. We present results on two toy datasets and four real datasets, with hands and faces, and report new state-of-the-art on two datasets in the wild, e.g. with only 5% of labeled images we outperform previous state-of-the-art trained on the AFLW dataset. Shapes DatasetBlocks Dataset Model HP: λ = 0, α = 0, γ = 0, β = 1, ADAM Model HP: λ = 1, α = 1, β = 1, ADAM Landmark Localization Network Landmark Localization Network Input = 60 × 60 × 1 Input = 60 × 60 × 1 Conv 7 × 7 × 16, ReLU, stride 1, SAME Conv 9 × 9 × 8, ReLU, stride 1, SAME Conv 7 × 7 × 16, ReLU, stride 1, SAME Conv 9 × 9 × 8, ReLU, stride 1, SAME Conv 7 × 7 × 16, ReLU, stride 1, SAME Conv 9 × 9 × 8, ReLU, stride 1, SAME Conv 7 × 7 × 16, ReLU, stride 1, SAME Conv 9 × 9 × 8, ReLU, stride 1, SAME Conv 7 × 7 × 16, ReLU, stride 1, SAME Conv 9 × 9 × 8, ReLU, stride 1, SAME Conv 7 × 7 × 16, ReLU, stride 1, SAME Conv 9 × 9 × 8, ReLU, stride 1, SAME Conv 1 × 1 × 16, ReLU, stride 1, SAME Conv 1 × 1 × 8, ReLU, stride 1, SAME Conv 1 × 1 × 2, ReLU, stride 1, SAME Conv 1 × 1 × 5, ReLU, stride 1, SAME soft-argmax(num channels=2) soft-argmax(num channels=5) Classification Network Classification Network FC #units = 40, ReLU FC #units = 256, ReLU, dropout-prob=.25 FC #units = 2, Linear FC #units = 256, ReLU, dropout-prob=.25 FC #units = 15, Linear softmax(dim=2) softmax(dim=15)Table S12: Architecture details of Seq-MT model used for Hands and Multi-PIE datasets.Hands Dataset Multi-PIE Dataset Model HP: λ = 0.5, α = 0.3, γ = 10 −5 , β = 0.001, ADAM Model HP: λ = 2, α = 0.3, γ = 10 −5 , β = 0.001, ADAM Preprocessing: scale and translation [-10%, 10%] of face bounding box, rotation [-20, 20] applied randomly to every epoch. Landmark Localization Network Landmark Localization Network Input = 64 × 64 × 1 Input = 64 × 64 × 1
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.