Association for Computing Machinery
Scientific facilities such as the Advanced Light Source (ALS) and Joint Genome Institute and projects such as the materials project have an increasing need to capture, store, and analyze dynamic semi-structured data and metadata. A similar growth of semi-structured data within large Internet service providers has led to the creation of NoSQL data stores for scalable indexing and MapReduce for scalable parallel analysis. MapReduce and NoSQL stores have been applied to scientific data. Hadoop, the most popular open source implementation of MapReduce, has been evaluated, utilized and modified for addressing the needs of different scientific analysis problems.