Xiaoyin Wang scite author profile

Mobile applications frequently access sensitive personal information to meet user or business requirements. Because such information is sensitive in general, regulators increasingly require mobileapp developers to publish privacy policies that describe what information is collected. Furthermore, regulators have fined companies when these policies are inconsistent with the actual data practices of mobile apps. To help mobile-app developers check their privacy policies against their apps' code for consistency, we propose a semi-automated framework that consists of a policy terminology-API method map that links policy phrases to API methods that produce sensitive information, and information flow analysis to detect misalignments. We present an implementation of our framework based on a privacy-policy-phrase ontology and a collection of mappings from API methods to policy phrases. Our empirical evaluation on 477 top Android apps discovered 341 potential privacy policy violations.

show abstract

A history-based matching approach to identification of framework evolution

Meng

Wang

Zhang

et al. 2012

View full text Add to dashboard Cite

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Xie

Zhang

et al. 2020

View full text Add to dashboard Cite

Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural language and programming language, recent studies have combined these two tasks to improve their performance. However, researchers have yet been able to effectively leverage the intrinsic connection between the two tasks as they train these tasks in a separate or pipeline manner, which means their performance can not be well balanced. In this paper, we propose a novel end-to-end model for the two tasks by introducing an additional code generation task. More specifically, we explicitly exploit the probabilistic correlation between code summarization and code generation with dual learning, and utilize the two encoders for code summarization and code generation to train the code retrieval task via multi-task learning. We have carried out extensive experiments on an existing dataset of SQL and Python, and results show that our model can significantly improve the results of the code retrieval task over the-state-of-art models, as well as achieve competitive performance in terms of BLEU score for the code summarization task.

show abstract

Experience paper: a study on behavioral backward incompatibilities of Java software libraries

Mostafa

Rodriguez

Wang

2017

View full text Add to dashboard Cite

How Reliable is the Crowdsourced Knowledge of Security Implementation?

Chen

Fischer

Meng

et al. 2019

View full text Add to dashboard Cite

Stack Overflow (SO) is the most popular online Q&A site for developers to share their expertise in solving programming issues. Given multiple answers to certain questions, developers may take the accepted answer, the answer from a person with high reputation, or the one frequently suggested. However, researchers recently observed that SO contains exploitable security vulnerabilities in the suggested code of popular answers, which found their way into security-sensitive highprofile applications that millions of users install every day. This observation inspires us to explore the following questions: How much can we trust the security implementation suggestions on SO? If suggested answers are vulnerable, can developers rely on the community's dynamics to infer the vulnerability and identify a secure counterpart?To answer these highly important questions, we conducted a comprehensive study on security-related SO posts by contrasting secure and insecure advice with the community-given content evaluation. Thereby, we investigated whether SO's gamification approach on incentivizing users is effective in improving security properties of distributed code examples. Moreover, we traced the distribution of duplicated samples over given answers to test whether the community behavior facilitates or prevents propagation of secure and insecure code suggestions within SO.We compiled 953 different groups of similar security-related code examples and labeled their security, identifying 785 secure answer posts and 644 insecure answer posts. Compared with secure suggestions, insecure ones had higher view counts (36,508 vs. 18,713), received a higher score (14 vs. 5), and had significantly more duplicates (3.8 vs. 3.0) on average. 34% of the posts provided by highly reputable so-called trusted users were insecure.Our findings show that based on the distribution of secure and insecure code on SO, users being laymen in security rely on additional advice and guidance. However, the communitygiven feedback does not allow differentiating secure from insecure choices. The reputation mechanism fails in indicating trustworthy users with respect to security questions, ultimately leaving other users wandering around alone in a software security minefield.Index Terms-Stack Overflow, crowdsourced knowledge, social dynamics, security implementation K e y P a i r G e n e r a t o r kpg = K e y P a i r G e n e r a t o r . g e t I n s t a n c e ( " RSA " ) ; kpg . i n i t i a l i z e ( 1 0 2 4 ) ; K e y P a i r kp = kpg . g e n e r a t e K e y P a i r ( ) ; RSAPublicKey pub = ( RSAPublicKey ) kp . g e t P u b l i c ( ) ; RSAPrivateKey p r i v = ( RSAPrivateKey ) kp . g e t P r i v a t e ( ) ;Hash: In the context of password-based key derivation, digital signatures, and authentication/authorization, developers may explicitly invoke broken hash functions. Listing 4 shows an example using MD5.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiaoyin Wang

Toward a framework for detecting privacy policy violations in android application code

A history-based matching approach to identification of framework evolution

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Experience paper: a study on behavioral backward incompatibilities of Java software libraries

How Reliable is the Crowdsourced Knowledge of Security Implementation?

Contact Info

Product

Resources

About