Technical Systems Integrators,
Keywords: HPC PCIe Network Computing AI
Summary:AI workflow use cases are pushing the limits of the current technology in network fabrics with regard to bandwidth and latency across their infrastructure deployments. Many leading technology companies (HP, Intel) have suggested that the network is the bottleneck for performance and that a new network fabric for the compute environment can help solve performance barriers for these technology implementations. Intel has further discussed RSD(Rack Scale Design) (Intel, 2018, p. 1) as a way of making simulator and compute environments more aligned with the workloads on them by disaggregating and re-aggregating compute, memory, storage, accelerators (GPUs - Graphics Processing Unit, FPGAs - Field Programmable Gate Arrays), and networking into more efficient stacks. New patented PCIe (Peripheral Component Interconnect express - industry standard) based network fabric has recently been used to implement this methodology to solve the bandwidth and latency issues by allowing PCIe to extend throughout the datacenter or AI cluster. This new capability provides disruptive technology gains for high performance computing and is being considered for deployment in High Performance Computer Centers throughout the DoD and industry. Applying this technology to AI use cases will offer new gains in latency (minimum of 10x less), provide for support of legacy environments at lower costs and support near theoretical bandwidth performance needed to support new AI/ML/DL use cases across a wide range of data sets . This abstract will outline some of the gains we have seen implementing this technology and show how it can be implemented in the AI use case world. This new technology will drastically change the architecture implementations, decrease costs and increase performance for these environments across a wide range of use cases in the Techconnect World community.