Getting Code Near the Data: A Study of Generating Customized Data Intensive Scientific-Workflows With Domain Specific Language
The amount of data produced in modern biological experiments such as Nuclear Magnetic Resonance (NMR) analysis far exceeds the processing capability of a single machine. The present state-of-the-art is taking the "Data to code", the philosophy followed by many of the current service oriented workflow systems. However this is not feasible in some cases such as NMR data analysis, primarily due to the large scale of data. The objective of this research is to bring "Code to data", preferred in the cases when the data is extremely large. The authors present a DSL based approach to develop customized data intensive scientific workflows capable of running on Hadoop clusters.