Recent work learns contextual representations of source code by reconstructing tokens from their context. For downstream semantic understanding tasks like code clone detection, these representations should ideally capture program functionality. However, we show that the popular reconstruction-based RoBERTa model is sensitive to source code edits, even when the edits preserve semantics. We propose Con-traCode: a contrastive pre-training task that learns code functionality, not form. Con-traCode pre-trains a neural network to identify functionally similar variants of a program among many non-equivalent distractors. We scalably generate these variants using an automated source-to-source compiler as a form of data augmentation. Contrastive pretraining outperforms RoBERTa on an adversarial code clone detection benchmark by 39% AUROC. Surprisingly, improved adversarial robustness translates to better accuracy over natural code; ContraCode improves summarization and TypeScript type inference accuracy by 2 to 13 percentage points over competitive baselines. All source is available at https://github.com/parasj/contracode.
Face detection and recognition are being studied extensively for their vast applications in security, biometrics, healthcare, and marketing. As a step towards presenting an almost accurate solution to the problem in hand, this paper proposes a face detection and face recognition pipeline - face detection and recognition embedNet (FDREnet). The proposed FDREnet involves face detection through histogram of oriented gradients and uses Siamese technique and contrastive loss to train a deep learning architecture (EmbedNet). The approach allows the EmbedNet to learn how to distinguish facial features apart from recognizing them. This flexibility in learning due to contrastive loss accounts for better accuracy than using traditional deep learning losses. The dataset’s embeddings produced from the trained FDREnet result accuracy of 98.03%, 99.57% and 99.39% for face94, face95, and face96 datasets respectively through SVM clustering. Accuracy of 97.83%, 99.57%, and 99.39% was observed for face94, face95, and face96 datasets respectively through KNN clustering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.