INFOREC Publishing House
In this paper, the authors are an extension to the "Distributed parallel architecture for storing and processing large datasets" paper presented at the conference in cam-bridge. In its original version the paper went over the benefits of using a distributed parallel architecture to store and process large datasets. This paper analyzes the problem of storing, processing and retrieving meaningful insight from petabytes of data. It provides a survey on current distributed and parallel data processing technologies and, based on them, will pro-pose an architecture that can be used to solve the analyzed problem.