Natural language descriptions of user interface (UI) elements such as alternative text are crucial for accessibility and language-based interaction in general. Yet, these descriptions are constantly missing in mobile UIs. We propose widget captioning, a novel task for automatically generating language descriptions for UI elements from multimodal input including both the image and the structural representations of user interfaces. We collected a largescale dataset for widget captioning with crowdsourcing. Our dataset contains 162,859 language phrases created by human workers for annotating 61,285 UI elements across 21,750 unique UI screens. We thoroughly analyze the dataset, and train and evaluate a set of deep model configurations to investigate how each feature modality as well as the choice of learning strategies impact the quality of predicted captions. The task formulation and the dataset as well as our benchmark models contribute a solid basis for this novel multimodal captioning task that connects language and user interfaces. * Equal contribution † Participated in the project during an internship at Google Research.
More GPS Bluetooth Wifi Airplane (e) (d) (c) (f) Figure 1: M3 Gesture Menu prototyped on Android with two levels of 2 × 3 tiles. (a) M3 is initially hidden with a designated area (e.g. the white circle) presented as its activation button. (b) Pressing and holding the button for 0.5 seconds pops up the first-level menu. (c) Sliding the finger to an item (e.g. Settings) reveals its submenu which replaces the the higher level content in the same space. (d) Further sliding to a terminal node (e.g. Wi f i) activates its command. (e) Alternatively, experienced users may directly draw a gesture from the activation button approximately to Settings, then to Wi f i to trigger the same command. (f) Different gestures trigger different commands, e.g. the figure illustrates all the gestures in the Settings submenu.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.