An Overview of Preprocessing of Web Log Files for Web Usage Mining
With the Internet usage gaining popularity and the steady growth of users, the World Wide Web has become a huge repository of data and serves as an important platform for the dissemination of information. The users' accesses to Web sites are stored in Web server logs. However, the data stored in the log files do not present an accurate picture of the users' accesses to the Web site. Hence, preprocessing of the Web log data is an essential and pre-requisite phase before it can be used for knowledge-discovery or mining tasks. The preprocessed Web data can then be suitable for the discovery and analysis of useful information referred to as Web mining.