A New Approach to Content-Based File Type Detection

Date Added: Nov 2009
Format: PDF

With the increase in use of computers and network security, it has become a difficult task to locate files in an operating system. The idea of this paper is to talk about content-based file type detection method that is the latest approach used. It discusses a new content-based method that is offered for file type detection and file type clustering based on the PCA and neural networks. Easy and accurate in result, this content-based detection method is useful in detecting files with the increased number of file formats that are transmitting between the inside and outside networks in an operating system. This paper discusses the importance and usage of proper functionality of operating systems dealing with file type classification. The authors published this paper with three introductory algorithms for content-based file type detection including Byte Frequency Analysis, Byte Frequency Cross-Relation, and File Header/Trailer analysis. In this paper, a new content-based file type detection method is introduced and mentioned that is the Principal Component Analysis (PCA) and unsupervised neural networks for the automatic feature extraction of a file in a system. This paper is divided into various sections that explain in details about different file detection methods with a proposed method and results. The method mentioned in this paper is fast and accurate for extracting applications in real time.