Today, Infrastructure-as-a-Service (IaaS) cloud providers have incorporated parallel data processing framework in their clouds for performing Many-Task Computing (MTC) applications. The processing frameworks which are currently used have been designed for static, homogenous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessary increase processing time and cost. In this paper, the authors discuss the opportunities and challenges for efficient parallel data processing in clouds and present their research project using Nephele. Nephele is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today's IaaS clouds for both, task scheduling and execution.