We present a novel framework for querying multi-modal data from a heterogeneous database containing images, textual tags, and GPS coordinates. We construct a bi-layer graph structure using localized image-parts and associated GPS locations and textual tags from the database. The first layer graphs capture similar data points from a single modality using a spectral clustering algorithm. The second layer of our multi-modal network allows one to integrate the relationships between clusters of different modalities. The proposed network model enables us to use flexible multi-modal queries on the database.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.