International Journal of Computer Science and Network
Discovery the suitable quantity of huddle to which credentials should be separation is vital in manuscript huddle. In this paper, the authors suggest a fresh approach, namely DPMAP (Dirichilet Process Model Attribute Partition), to realize the embryonic huddle construction based on the DPM model lack in require the amount of huddle as key. Elements classify into two classes, important expressions and unmatched terms. To infer document album constitution and separation document words at the equivalent time by using Variation assumption algorithm.