Anatomy of a Globally Recursive Embedded LINPACK Benchmark
The authors present a complete bottom-up implementation of an embedded LINPACK benchmark on the iPad 2. They use a novel formulation of a recursive LU factorization that is recursive and parallel at the global scope. They believe their new algorithm presents an alternative to existing linear algebra parallelization techniques such as master-worker and DAG-based approaches. They show an assembly API that allows one a much higher level of abstraction and provides rapid code development within the confines of a mobile device SDK. They use performance modeling to help with the limitation of the device and the limited access to the device from the development environment not geared for HPC application tuning.