Large Scale Learning and Recognition of Faces in Web Videos
The phenomenal growth of video on the web and the increasing sparseness of meta information associated with it forces one to look for signals from the video content for search/information retrieval and browsing based corpus exploration. A large chunk of users' searching/browsing patterns are centered around people present in the video. Doing it at scale in videos remains hard due to the absence of labeled data for such a large set of people and the large variation of pose/illumination/expression/age/occlusion/quality, etc., in the target corpus. The authors propose a system that can learn and recognize faces by combining signals from large scale weakly labeled text, image, and video corpora.