Z. Sadman, A. Qasem
Texas State University,
United States
Keywords: large language models (LLMs), high-performance computing (HPC), compiler optimization, dynamic runtime optimization
Summary:
Recently, Large Language Models (LLMs) have become highly effective tools, capable of handling a wide range of tasks in content creation, language translation, business analytics, and software engineering. Language Models like Code Llama and Codex exhibit a good understanding of code-based tasks, such as code generation, translation, and auto-completion of unfinished code. However, their potential applications in HPC performance modeling have yet to be realized. While LLMs, with their capability to understand and generate code while analyzing intricate patterns, hold great promise, they rely on the availability of large volumes of training data, in the order of 10¹⁰ tokens. Creating robust and dynamic datasets that accurately represent real-world workloads and their performance behavior in dynamic execution environments poses significant challenges, primarily due to the enormous cost in computation time on production HPC systems. Further, HPC performance modeling requires learning from a vast number of runtime parameters that result in very high-dimensional datasets where the number of features can exceed the number of observations. Consequently, the training time itself can be prohibitive, requiring tens of thousands of peta-flop days. This study reviews the state of the art in LLMs for HPC performance modeling, discusses the challenges in adopting such strategies in the HPC domain, and outlines future directions. We propose a new workflow for HPC-LLM, which leverages Code Llama and enables the language model to learn from fine-grain runtime performance data. The process involves kernel extraction—identifying the parts of programs most responsible for runtime—and feeding them into the LLVM compilation framework to generate an unoptimized Intermediate Representation (IR), a platform-independent representation of the code. The unoptimized IR is then annotated with hardware performance counter data to create tokens that capture program attributes and runtime performance behavior. This augmented IR is analyzed by the compiler, applying advanced optimization techniques to produce an optimized IR for improved code performance, better hardware utilization, and minimized compilation time. To address the challenges of long training times, we propose a method using attention-based transfer learning for faster training. By integrating LLMs into the compiler optimization process, our approach aims to surpass traditional methods, enhancing efficiency and enabling the creation of dynamic and robust datasets through optimized workflows and LLM-driven techniques, ultimately addressing the limitations of HPC performance modeling.