Efficient Document Clustering with Semantic Analysis Using Parallel Spectral Method
Text documents are one of the unstructured data models that represents the information. Clustering techniques are used to group the documents based on the similarity. Parallel spectral clustering algorithm is used to cluster large data collections. Dense similarity matrix approximation is applied to reduce memory usage. Parallel spectral clustering algorithm is improved to perform document clustering under distributed environment. The term weights are used in the parallel spectral clustering schemes. The system uses the semantic analysis model. The term relationships are identified using the ontology repository. Semantic weight is used for the document representation. The proposed system is designed to enhance the parallel spectral clustering algorithm to reduce communication overhead. Synchronization tasks are optimized by the system.