SHARE

4 steps to implementing high-performance computing for big data processing

If your company needs high-performance computing for its big data, an in-house operation might work best. Here’s what you need to know, including how high-performance computing and Hadoop differ.

Written By

Mary Shacklett

Feb 20, 2018

We may earn from vendors via affiliate links or sponsorships. This might affect product placement on our site, but not the content of our reviews. See our Terms of Use for details.

In the big data world, not every company needs high performance computing (HPC), but nearly all who work with big data have adopted Hadoop-style analytics computing.

The difference between HPC and Hadoop can be hard to distinguish because it is possible to run Hadoop analytics jobs on HPC gear, although not vice versa. Both HPC and Hadoop analytics use parallel processing of data, but in a Hadoop/analytics environment, data is stored on commodity hardware and distributed across multiple nodes of this hardware. In HPC, where the size of data files is much greater, data storage is centralized. HPC, because of the sheer volume of its files, also requires more expensive networking communications, such as Infiniband, because the size of the files it processes require high throughput and low latency.

Must-read big data coverage

The message for company CIOs is clear: if you can avoid HPC and just use Hadoop for your analytics, do it. It is cheaper, easier for your staff to run, and might even be able to run in the cloud, where someone else (like a third party vendor) can run it.

Unfortunately, being an all-Hadoop shop is not possible for the many life sciences, weather, pharmaceutical, mining, medical, government, and academic companies and institutions that require HPC for processing. Because file size is large and processing needs are extreme, standard network communications, or connecting with the cloud, aren’t alternatives, either.

In short, HPC is a perfect example of a big data platform that is best run in-house in a data center. Because of this, the challenge becomes–how do you (and your staff) assure that the very expensive hardware you invest in is the best shape to do the job you need it to do?

“This is a challenge that many companies that must use HPC for their big data processing face,” said Alex Lesser, chief strategy officer at PSCC Labs, a big data Hadoop and HPC platform provider. “Most of these companies have a history of supporting a traditional IT infrastructure. They are comfortable getting out of this mindset to tackle a Hadoop analytics computing environment themselves because it uses commodity hardware they are already familiar with, but when it comes to HPC, the response is often “let the vendor take care of it.”

SEE: How to win with prescriptive analytics (ZDNet special report) | Download the free PDF ebook (TechRepublic)

If considering a move to HPC seems right for your company, here are four steps to take:

1. Confirm that you have high-level support for HPC

Upper management and the board don’t have to be HPC gurus, but their understanding and support should never be presumed. Both groups should have enough understanding about HPC and what it can do for your company to be unequivocally in support of the sizable hardware, software and training investments you are likely to make. This means that they must be educated on two fronts: 1) what HPC is, and why it is different from plain old analytics and needs special hardware and software; 2) why it is necessary for the company to use HPC versus plain old analytics in order to meet its business objectives. Both of these educational efforts should be undertaken by the CIO or the CDO.

“The most aggressive companies in HPC adoption are those that believe they are really technology companies, and that it is technology that will position them ahead of the field,” said Lesser, who points to Amazon, where AWS cloud services — originally a spinoff from Amazon’s retail operation — is now a gargantuan profit center in its own right.

SEE: IT hardware procurement policy (Tech Pro Research)

2. Consider a preconfigured hardware platform that you can customize

Companies like PSSC Labs offer pre-packaged and pre-configured HPC hardware. “We have a basic package that is set up with HPC best practices, and then work with the client to customize this base package for the client’s computing needs,” said Lesser, who notes that almost every site has some customization it must do.

3. Understand the payback

Like any IT investment, HPC must be cost-justifiable for the company, and you should be able to develop a return on investment (ROI) that pencils out in management’s and the board’s minds. “A good example is plane design,” said Lesser. “The HPC investment was sizable, but when the company saw that it could run its design simulations with HPC and obtain five 9s accuracy and that it no longer had to rent physical wind tunnels, it recouped its HPC investment quickly.”

SEE: Turning Big Data into Business Insights (ZDNet special feature) | Download as a PDF (TechRepublic)

4. Train your staff

HPC computing is not a simple transition for your IT staff, but if you are going to run an on-premise operation, you should be positioning your team to be self-sufficient with it.

Initially, you might have to hire outside consulting to get started–but the goal of the consulting assignment should always be two-fold: 1) to get the HPC apps going, and 2) to transfer knowledge to staff so they can take over operations. You should not settle for less.

At its core, the HPC team will require a data scientist who is capable of developing the highly complex algorithms that HPC needs to answer your company’s questions. It will also require a strong system programmer versed in C+ or Fortran skills, and able to work in a parallel processing environment, and a network communications specialist.

“The bottom line is, if your company is going to run a job only once or twice every two weeks, you should go to the cloud to host your HPC,” said Lesser, “But if you’re using HPC resources and running jobs multiple times per day, like a pharmaceutical or genetics company might, you’re wasting money running in the cloud and should highly consider running your own in-house operation.”

Also see:

HPE’s new HPC and AI systems aim to speed insights for science and business (TechRepublic)
Supercomputers coming soon to an office near you (TechRepublic)
These Kubernetes developments make the platform ripe to explode in 2018 (TechRepublic)
Nvidia expands new GPU cloud to HPC applications (ZDNet)
Dell EMC high-performance computing bundles aimed at AI, deep learning (ZDNet)

Mary Shacklett

Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.