Bridging Vision and Commonsense for Multimodal Situation Recognition in Pervasive Systems
Pervasive services may have to rely on multimodal classification to implement situation-recognition. However, the effectiveness of current multimodal classifiers is often not satisfactory. In this paper, the authors describe a novel approach to multimodal classification based on integrating a vision sensor with a commonsense knowledge base. Specifically, their approach is based on extracting the individual objects perceived by a camera and classifying them individually with nonparametric algorithms; then, using a commonsense knowledge base, classifying the overall scene with high effectiveness. Such classification results can then be fused together with other sensors, again on a commonsense basis, for both improving classification accuracy and dealing with missing labels.