Barrier synchronization is a key programming primitive for shared memory embedded MPSoCs. As the core count increases, software implementations cannot provide the needed performance and scalability, thus making hardware acceleration critical. In this paper the authors describe an interconnect extension implemented with standard cells and with a mainstream industrial toolflow. They show that the area overhead is marginal with respect to the performance improvements of the resulting hardware-accelerated barriers. They integrate their HWbarrier into the OpenMP programming model and discuss synchronization efficiency compared with traditional software implementations.