A Methodology for the Usage of Side Data in Content Mining
Compelling in different text mining applications, side-information is accessible close-by the text records. Such side-information may be of distinctive sorts, case in point, report provenance information, the relationship in the record, client access conduct from web logs, or other non-textual properties which are embedded into the text document. Such qualities may contain a monster measure of information for clustering purposes. On the other hand, the relative targets of this side-information may be hard to gage, particularly precisely when a portion of the information is uproarious.