Identifying similar or identical code fragments becomes much more challenging in code theft cases where plagiarizers can use various automated code transformation techniques to hide stolen code from being detected. Previous works in this field are largely limited in that (1) most of them cannot handle advanced obfuscation techniques; (2) the methods based on source code analysis are less practical since the source code of suspicious programs is typically not available until strong evidences are collected; and (3) those depending on the features of specific operating systems or programming languages have limited applicability.Based on an observation that some critical runtime values are hard to be replaced or eliminated by semanticspreserving transformation techniques, we introduce a novel approach to dynamic characterization of executable programs. Leveraging such invariant values, our technique is resilient to various control and data obfuscation techniques. We show how the values can be extracted and refined to expose the critical values and how we can apply this runtime property to help solve problems in software plagiarism detection. We have implemented a prototype with a dynamic taint analyzer atop a generic processor emulator. Our experimental results show that the value-based method successfully discriminates 34 plagiarisms obfuscated by SandMark, plagiarisms heavily obfuscated by KlassMaster, programs obfuscated by Thicket, and executables obfuscated by Loco/Diablo.
Cancer prognosis analysis is of essential interest in clinical practice. In order to explore the prognostic power of computational histopathology and genomics, this paper constructs a multi-modality prognostic model for survival prediction. We collected 346 patients diagnosed with hepatocellular carcinoma (HCC) from The Cancer Genome Atlas (TCGA), each patient has 1–3 whole slide images (WSIs) and an mRNA expression file. WSIs were processed by a multi-instance deep learning model to obtain the patient-level survival risk scores; mRNA expression data were processed by weighted gene co-expression network analysis (WGCNA), and the top hub genes of each module were extracted as risk factors. Information from two modalities was integrated by Cox proportional hazard model to predict patient outcomes. The overall survival predictions of the multi-modality model (Concordance index (C-index): 0.746, 95% confidence interval (CI): ±0.077) outperformed these based on histopathology risk score or hub genes, respectively. Furthermore, in the prediction of 1-year and 3-year survival, the area under curve of the model achieved 0.816 and 0.810. In conclusion, this paper provides an effective workflow for multi-modality prognosis of HCC, the integration of histopathology and genomic information has the potential to assist clinical prognosis management.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.