The impending end of Moore’s Law—the doubling of transistors in integrated circuits about every two years—is forcing major philosophical shifts in how computers are architectured. Increasing clock speed to increase performance led to diminished returns 15 years ago, as thermal limitations prevented CPUs from meaningfully exceeding 4 GHz for extended periods of time. To mitigate this, increasing core counts and incremental microarchitectural improvements have driven performance improvements since then, though this strategy, likewise, is losing efficacy.

In search of higher performance, adoption of GPUs for general-purpose compute tasks (otherwise known as GPGPU) has increased significantly for artificial intelligence (AI) and machine learning workloads. However, GPU utilization is just one aspect—the use of compute accelerators, such as smart NICs, application-oriented FPGAs, and differing tiers of memory including storage-class memory (SCM) are necessary to drive performance improvements in a post-Moore’s Law environment.

SEE: The ethical challenges of AI: A leader’s guide (free PDF) (TechRepublic)

How those accelerators connect to systems is a significant concern—while POWER9 uses the relatively speedy PCI Express 4.0 standard and DDR4, these are (relatively) die-hungry connections. The Open Coherent Accelerator Processor Interface (OpenCAPI) and Open Memory Interface (OMI) provide a technology-agnostic and low-latency means of connecting accelerators and memory to a CPU.

As part of IBM’s open sourcing of the POWER ISA this week, reference implementations of the (nominally) platform-agnostic OpenCAPI and OMI were also published, providing an asymmetric, low latency, and high bandwidth interface to connect accelerators and different types of RAM, allowing for independent development of accelerator technology, which has a faster development cycle than the years-long process for successive generations of ISA.

Earlier this month, Microsemi announced the SMC 1000 8x25G, an eight-lane OMI connected controller that connects up to DDR4-3200 speeds, on an 84-pin differential DIMM (DDIMM). Compared to directly-attached memory, OMI memory incurs only a 5-10ns load-to-use penalty, with OMI requiring a sixth the die area as directly-connected DDR. “The result is a significant reduction in the required number of host CPU or SoC pins per DDR4 memory channel, allowing for more memory channels and increasing the memory bandwidth available,” Microsemi claimed in a press release.

“This happens to be a DDR4 product, but it can be switched out with anything, PRAM, GDDR, you name it,” Mendy Furmanek, president of the OpenPOWER Foundation, told TechRepublic.
“We’ve talked to a number of memory companies that are quite enticed, and especially in picking up our RTL that can get them started.”

Likewise, the benefits for other types of accelerators make it easier for developers to build their own custom solutions.

“[Companies] want to see top-to-bottom open models so that they can—as they are building custom applications—accelerate those all the way down to the silicon. Being able to… work at every layer in the stack, is something that allows them to just create much better solutions for themselves,” Jim Zemlin, Linux Foundation executive director, told TechRepublic. “Customers may do prototyping, or work with this open code, but then go to someone that can provide commercial support and bring an implementation home… that has proven to be the modern way that end users want to buy and build technology.”

“This is the fulfillment of my personal dream, to have projects in every single layer of the stack, but I think it’s just where the industry is going,” Zemlin added.

For more, see “Raptor’s Talos II Lite brings POWER9 to the desktop without breaking the bank” and “AMD’s 16-core Ryzen 9 3950X hits 4.7 GHz in turbo at only 105W TDP.”

Disclosure: James Sanders is an associate member of the OpenPOWER Foundation.

Image: Thomas-Soellner, Getty Images/iStockphoto