Date Added: Jun 2009
The scalability, as well as the effectiveness, of the different Content-Based Image Retrieval (CBIR) approaches proposed in literature, is today an important research issue. Given the wealth of images on the Web, CBIR systems must in fact leap towards Web-scale datasets. In this paper, the authors report on the experience in building a test collection of 100 million images, with the corresponding descriptive features, to be used in experimenting new scalable techniques for similarity searching, and comparing their results. In the context of the SAPIR (Search on Audio-visual content using Peer-to-peer Information Retrieval) European project, they had to experiment the distributed similarity searching technology on a realistic data set.