An Architecture for Mining WEB Content Hosted on Clustered Backend Servers
Web content mining is mining of various types of content that include Text, Images, data, etc that are provided in terms of various types of WEB resources that include HTML, DHTML, XML, PHP, JSP, ASP, DLL etc. Sometimes the entire WEB content is horizontally portioned and each of the partition is located on a different WEB server and all the WEB servers that contain entire content are clustered. This is done to handle heavy traffic. At times, web content is also replicated on to different web servers which are strategically positioned at traffic locations. To mine knowledge out of the content which is distributed or replicated is a challenge.