Historically, processor performance has increased at a much faster rate than that of main memory and up-coming NoC-based many-core architectures are further tightening the memory bottleneck. 3D integration based on TSV technology may provide a solution, as it enables stacking of multiple memory layers, with orders-of-magnitude increase in memory interface bandwidth, speed and energy efficiency. To fully exploit this potential, the architectural interface to vertically stacked memory must be streamlined. In this paper, the authors present an efficient and flexible distributed memory interface for 3D-stacked DRAM. Their interface ensures ultra-low-latency access to the memory modules on top of each processing element (vertically local memory neighborhoods).