To implement fine-grained context recognition that is accurate and affordable for general households, we present a novel technique that integrates multiple image-based cognitive APIs and light-weight machine learning. Our key idea is to regard every image as a document by exploiting "tags" derived by multiple APIs. The aim of this paper is to compare API-based models' performance and improve the recognition accuracy by preserving the affordability for general households. We present a novel method for further improving the recognition accuracy based on multiple cognitive APIs and four modules, fork integration, majority voting, score voting, and range voting.Our interest is to use little image data, applying cognitive services to implement affordable context sensing that can adapt to custom contexts in every single house. The cognitive service is a cloud service with cognitive computing functions that provide the capability to understand multimedia data (i.e., vision (object recognition) [19], speech recognition [20], natural language processing [21], etc.), based on sophisticated machine-learning algorithms powered by the offered big data and large-scale computing. They are offered by cloud companies, but the data processing algorithms behind them are not public. They are also widely used in various fields of research, such as modern knowledge management solutions [22] and criminal detection and recognition [23]. The cognitive API is an application program interface via the HTTP/REST protocol [24], with which developers can easily integrate powerful recognition features in their own applications. An image-based cognitive API receives an image from an external application, i.e., extracts specific information from the image, and returns the information in JavaScript Object Notation (JSON) [25] format from the cloud server rather than the local. The information usually contains a set of words called "tags", representing objects and concepts that the API has recognized in the given image. Examples of tags from the API are: [Living, room, indoors, classroom, basement, supporting structure]. The information of interest and the way of recognizing the image vary among individual cognitive services. Related work uses image tagging technology with deep learning, as in [26][27][28], but the implementation is more complex. In our future realistic implementation, for security and privacy of the users the images are not saved after sending the APIs over.The main contribution of this paper is to present a novel method which is not only affordable but also has higher accuracy in recognizing fine-grained home contexts. For this purpose, we are currently investigating techniques that integrate inexpensive camera devices, multiple image-based cognitive APIs, and light-weight machine learning. We previously encoded the tags of a single API to document vectors, then applied them into machine learning for the model construction [29]. However, we found that the accuracy significantly decreased for contexts with multiple people (e.g., "...