Robust and Portable Performance Datasets for HPC with Generative AI

M. Ali, A. Qasem
Texas State University,
United States

Keywords: high-performance computing,generative AI, diffusion models, energy dataset, power modeling, power consumption prediction, optimization tasks

Summary:

Recently Machine learning-driven models have gained popularity for addressing challenges in power measurement and estimation. These models provide innovative approaches for designing applications that are energy-friendly and efficient. However, the effectiveness of these ML models is often limited by the constraints of having limited and non-representative datasets. Developing such datasets for model training is not only computationally intensive but also time-consuming, particularly for emerging or rare hardware platforms. This research introduces a diffusion-based approach to alleviate dataset constraints by synthesizing energy data using the available limited datasets. The proposed method utilizes a Denoising Diffusion Model (DDM) to transform a limited software energy dataset into a larger, more representative dataset, enabling power modeling and related optimization tasks. We validate the effectiveness of our approach through its application to energy-efficient graph processing. The performance of irregular graph algorithms depends on code attributes, input graph shape, and hardware characteristics, which could be challenging for the ML model as it needs to integrate data from multiple heterogeneous domains. To overcome this issue, we use a method of representation learning that combines the program features and attributes into a unified space of energy profile embeddings derived solely from runtime events. These embeddings are designed using experimentation, Principal Component Analysis, and regression techniques. The derived embeddings include eight features based on 17 runtime events. The final dataset includes average power consumption, execution time, and peak power. The DDM model processes the data into two steps: forward diffusion and reverse diffusion. The forward diffusion starts with the initial sample from the original distribution and progressively adds Gaussian noise, transforming it into latent variables. During the reverse diffusion process, the model denoises the latent variables to generate new samples that follow the original distribution. Gaussian diffusion is employed for numerical features, while multinomial diffusion is used for categorical features. We evaluate the significance of the proposed DDM approach against available generative models including Generative Adversarial Networks(GAN). Data generated by the DDM model closely aligns with the original data distribution achieving higher Kolmogorov-Smirnov (KS) scores. Furthermore, the principal component(PC) of the DDM data matches the frequency density values of the original data indicating effectively preserving the underlying structure of the original data. In predictive modeling tasks including classification and regression of GPU power consumption, when the model trained with the DDM-generated data achieves superior performance compared to the baseline and the GAN-generated data. In the regression task, Random Forest and Gradient Boosting regressors trained on DDM data achieve the highest accuracy. For classification, models trained on DDM data show better generalization with accuracy improved from 81% to 92% in power consumption category prediction. This study highlights the advantages of diffusion-based generative models in overcoming the energy dataset constraint. The generated dataset can be utilized for developing power prediction models, optimizing energy efficiency, designing optimization recommendation systems, and addressing other energy-related optimization tasks.

« Back