Date Added: Mar 2013
The Apache Hadoop Distributed File System allows companies to store and manage petabytes of data which is collected from disparate data sources and is far more efficient than relational database management systems. A huge number of companies have begun using the open-source technology to aggregate and analyze large volumes of structured and unstructured data which is captured from websites, social media networks, emails, audio and video files, sensors and machines. This paper examines Hadoop cluster and security for Hadoop clusters using Kerberos. Further the authors see security enhancement using role based access control, reviewing built - in protections and weaknesses of these systems.