Effective Multi-Modal Retrieval Based on Stacked Auto-Encoders
Multi-modal retrieval is emerging as a new search paradigm that enables seamless information retrieval from various types of media. For example, users can simply snap a movie poster to search relevant reviews and trailers. To solve the problem, a set of mapping functions are learned to project high-dimensional features extracted from data of different media types into a common low-dimensional space so that metric distance measures can be applied. In this paper, the authors propose an effective mapping mechanism based on deep learning (i.e., stacked auto-encoders) for multi-modal retrieval.