Manuel Geissinger

Photo by Manuel Geissinger from Pexels

Over the past two decades, high-performance computing (HPC) has become widely used throughout drug discovery and development. Significant advances in HPC processing power in the early 2000s made it viable for tasks like reverse-engineering models of human disease progression and drug response or processing vast amounts of data from sources like the Human Genome Project. An article published in Expert Opinion on Drug Discovery claimed, “we can safely state that it would not be possible to process all the information available in the chemical databases of potential drug compounds, either accurately or quickly without HPC, since the required computational power is huge.”

One aspect of this shift that often goes overlooked is the new demands that HPC workloads place on IT and network operations (NetOps) teams at the organizations that use them. High-performance computing requires an extremely high-powered network with ultra-low latency to quickly move large files between HPC nodes. IT and NetOps teams at pharmaceutical and biotechnology research companies will need to monitor their networks in exacting detail to ensure they can meet these high demands. This creates many challenges above and beyond the average enterprise network — challenges that these teams are likely unprepared for.

There are three pressing IT issues that HPC workloads create: maintaining ultra-low latency on networks at speeds above 10 Gbps, detecting microbursts, and measuring network traffic at millisecond and nanosecond granularity. Let’s take a closer look.

High-speed packet capture hardware

HPC requires network speeds of 40 or 100 Gbps, but most network monitoring tools are not built for this use case. IT and NetOps teams will typically use solutions like network TAPs, packet brokers, packet capture, and network analysis software to monitor and manage the network’s performance. But these physical or virtual network monitoring products must be upgraded to keep pace with increasing network speed; a general-purpose CPU architecture cannot capture packets at speeds over 10 Gbps without hardware assistance. Using a subpar monitoring solution will either slow down the network overall, affecting the performance of the HPC workloads, or it will miss packets, creating blind spots and affecting IT’s ability to troubleshoot. Neither situation is viable when supporting advanced biotechnology modeling.

To resolve this, IT must choose monitoring hardware specifically built for high-speed networks that can capture and process packets at the required speeds without dropping any. If components of the core network have been upgraded to 100 Gbps speeds, some or all of the network monitoring infrastructure will likely need to be upgraded as well.

Tracking microbursts

“Bursts” of traffic are an issue with all networks. Performance suffers when traffic spikes above a network’s allowable load for a short period. For HPC networks with extremely high-performance requirements, “microburst” can occur where traffic spikes for just a few milliseconds. This is often enough to affect HPC performance, but the spikes are so small they are difficult to detect without extremely fine-grained monitoring capabilities. Granular network monitoring and the ability to analyze microbursts will help NetOps teams track down the source of these persistent issues.

Granular measurements, multiple capture points

HPC workloads require extremely low network latency, usually less than a millisecond. To compensate, monitoring tools must measure latency to a more granular level (for example, if the HPC workloads cannot tolerate more than 2 milliseconds of latency, then the monitoring tools must measure it in 1-millisecond intervals). But, again, tools not built to monitor HPC workloads often won’t be sufficient.

This high bar often requires IT to gather metrics at multiple points throughout the network. Some network monitoring solutions stream packet data to a central application that does all the processing, while some process the data at the point where it is captured and then compile the results. The second option is usually necessary for HPC workloads because streaming the data to a central point before processing adds a slight delay. This might be acceptable on the average enterprise network, but it will cause issues in HPC environments with stringent latency requirements. Timestamping each packet at the point it is received, with very high precision, can also help mitigate this.

Kedar Hiremath

Kedar Hiremath

Considering these challenges, network monitoring and observability solutions must be specifically built to accommodate HPC workloads’ demands. An IT department cannot support HPC workloads with a monitoring setup intended for a normal enterprise networking use case. Pharmaceutical and biotech companies struggling with HPC workloads should check if their IT monitoring capabilities have been left behind.

Kedar Hiremath is a senior solutions marketing manager at cPacket Networks. Kedar has been in the technology space for over seven years leading go-to-market strategies, content and product launches, most recently at IBM. He holds a master’s degree in computer science from Santa Clara University.