A Partition Algorithm for Matching String Patterns in Large Text Databases
The size of the Web as well as user bases of search systems continues to grow exponentially. Consequently, providing sub second query response times and high query throughput become quite challenging for large-scale information retrieval systems. An important subtask of the pattern discovery process is pattern matching, where the pattern sought is already known and the authors want to determine how often and where it occurs in a sequence. If the text character aligned with the end of the pattern is a mismatch, they continue by examining text characters after the alignment. Distributing different aspects of search (e.g., crawling, indexing, and query processing) is essential to achieve scalability in large-scale information retrieval systems.