Digging Deeper Into Deep Web Databases by Breaking Through the Top-K Barrier
A large number of web databases are only accessible through proprietary form-like interfaces which require users to query the system by entering desired values for a few attributes. A key restriction enforced by such an interface is the top-k output constraint - i.e., when there are a large number of matching tuples, only a few (top-k) of them are preferentially selected and returned by the website, often according to a proprietary ranking function. Since most web database owners set k to be a small value, the top-k output constraint prevents many interesting third-party (e.g., mashup) services from being developed over real-world web databases. In this paper, the authors consider the novel problem of "Digging deeper" into such web databases.