International Journal of Computer Applications
The World Wide Web (WWW) and information system has gained significant achievements over the last two decades as expressed their dominance in various business and scientific applications. As estimated by the researchers, more than 85% of all business information exists in the form of unstructured and semi-structured document, typically formatted for human viewing, not for system processing. Extracting information from these document are challenging task. Extracting grammar rules from these documents is an interesting idea.