International Journal of Innovative Research in Science, Engineering and Technology (IJIRSET)
Document filtering is probably the most challenging task in the web. Giving a prominent search result by filtering the document is a measure issue. Semantic similarity and large document clustering is the most difficult task as the web data has a lot of redundancy like outliers, missing values, etc., data prepossessing is very much necessary. Search results produced by social search engine (web search) give more visibility to the content created. This paper focuses on semantic similarity measure, the F-measure for large document clustering.