In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Since most available caption datasets have been constructed for English language, there are few datasets for Japanese. To tackle this problem, we construct a large-scale Japanese image caption dataset based on images from MS-COCO, which is called STAIR Captions. STAIR Captions consists of 820,310 Japanese captions for 164,062 images. In the experiment, we show that a neural network trained using STAIR Captions can generate more natural and better Japanese captions, compared to those generated using English-Japanese machine translation after generating English captions.
A new large-scale video dataset for human action recognition, called STAIR Actions is introduced. STAIR Actions contains 100 categories of action labels representing fine-grained everyday home actions so that it can be applied to research in various home tasks such as nursing, caring, and security. In STAIR Actions, each video has a single action label. Moreover, for each action category, there are around 1,000 videos that were obtained from YouTube or produced by crowdsource workers. The duration of each video is mostly five to six seconds. The total number of videos is 102,462. We explain how we constructed STAIR Actions and show the characteristics of STAIR Actions compared to existing datasets for human action recognition. Experiments with three major models for action recognition show that STAIR Actions can train large models and achieve good performance. STAIR Actions can be downloaded from http://actions.stair.center.
Predicting conversion rates (CVRs) in display advertising (e.g., predicting the proportion of users who purchase an item (i.e., a conversion) after its corresponding ad is clicked) is important when measuring the effects of ads shown to users and to understanding the interests of the users. There is generally a time delay (i.e., so-called delayed feedback) between the ad click and conversion. Owing to the delayed feedback, samples that are converted after an observation period may be treated as negative. To overcome this drawback, CVR prediction assuming that the time delay follows an exponential distribution has been proposed. In practice, however, there is no guarantee that the delay is generated from the exponential distribution, and the best distribution with which to represent the delay depends on the data. In this paper, we propose a nonparametric delayed feedback model for CVR prediction that represents the distribution of the time delay without assuming a parametric distribution, such as an exponential or Weibull distribution. Because the distribution of the time delay is modeled depending on the content of an ad and the features of a user, various shapes of the distribution can be represented potentially. In experiments, we show that the proposed model can capture the distribution for the time delay on a synthetic dataset, even when the distribution is complicated. Moreover, on a real dataset, we show that the proposed model outperforms the existing method that assumes an exponential distribution for the time delay in terms of conversion rate prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.