Heterogeneous AI Inference Platform for Energy-Efficient Large Language Models

S. Kim, S. Yi, Y. Jeong, C. Lee, C.S. Kim
Simplus Inc.,
United States

Keywords: AI accelerator, heterogeneous CIM, high performance AI, LLM

Summary:

Large Language Models (LLMs), like GPT-3, have transformed industries by enabling AI to generate human-like text, answer questions, and provide insights at unprecedented levels. However, running these powerful models is incredibly expensive and energy-intensive. A single LLM query uses 60 times more electricity than a typical web search, creating challenges for data centers and AI providers. Current GPUs, which handle the heavy computations required for LLMs, consume large amounts of power, generate heat, and are costly to manufacture. These issues make it hard for companies to scale AI services effectively. Simplus offers a breakthrough solution: a heterogeneous AI inference platform that combines analog cores and logic-based processing cores on a single chip. This innovative design significantly reduces energy use while maintaining performance. The analog cores use ultra-high-density memory to handle repetitive tasks (97% of LLM computations), while the logic cores tackle more flexible tasks. By leveraging this combination, our platform is expected to achieve 50 times lower power consumption than GPUs, making it both energy-efficient and cost-effective. Traditional Compute-In-Memory (CIM) systems rely on SRAM, which is fast but expensive and limited in capacity. Simplus’s platform overcomes these limitations by using high-density memory. While high-density memory hasn’t been widely used in AI due to precision challenges, our technology solves this with advanced logic circuits and a patent-pending approach called shift-and-add, which ensures high-precision calculations. This combination makes our platform uniquely suited to handle large-scale AI tasks without the energy and cost drawbacks of traditional hardware. Simplus’s platform is ideal for industries that depend on AI inference services, such as Big Tech companies, data centers, and cloud providers. These businesses face mounting energy costs as they scale AI services. By reducing power consumption and eliminating the need for external memory for most computations, our platform enables sustainable AI growth while cutting operational expenses. Additionally, it opens up opportunities for affordable cloud-based AI services, allowing end users to access advanced AI capabilities at lower costs—up to 50% cheaper than current providers. Our initial focus is on AI inference server rentals for Big Tech companies. These companies can use our energy-efficient servers to handle LLM workloads more affordably and sustainably. At the same time, we plan to offer direct cloud-based AI services to a broader range of users, making cutting-edge AI accessible to more industries and businesses. To accelerate commercialization, Simplus will partner with cloud providers, semiconductor manufacturers, and hardware integrators. To ensure success, we are validating our technology with a detailed simulation of our platform’s architecture. Using a MATLAB-based simulator, we optimize chip specifications, such as memory density and precision, and test data flow algorithms between the analog and logic cores. Once validated, we will build a prototype using commercial off-the-shelf components like FPGA and high-density memory modules. Simplus’s heterogeneous AI inference platform is a game-changing innovation. It combines energy efficiency, scalability, and cost-effectiveness, paving the way for sustainable, large-scale AI systems and future multimodal AI applications that can transform industries.