Research Webzine of the KAIST College of Engineering since 2014
Fall 2025 Vol. 25Generative AI today relies overwhelmingly on GPUs, yet the industry is shifting toward more diverse hardware. This shift is driven by NPUs such as Google TPU, Amazon Inferentia, and startup-built NPUs, as well as PIM accelerators from Samsung and SK hynix. KAIST has developed hardware-software co-design technologies that integrate these heterogeneous GPU-NPU-PIM chips into next-generation AI clouds.
A conceptual illustration of next-generation generative AI cloud infrastructure, where diverse accelerators including GPU, NPU, and PIM accelerators operate together as a unified system. KAIST’s research aims to enable this heterogeneous integration to overcome the limitations of today’s GPU-centric AI computing.
Generative AI inference today is powered largely by NVIDIA GPU, but the landscape is shifting rapidly toward more diverse accelerators. Neural Processing Units (NPU), introduced by global hyperscalers through efforts such as Google’s TPU and Amazon’s Inferentia, represent one major direction. In parallel, AI chip startups including FuriosaAI, Rebellions, and HyperAccel are pushing NPU architectures tailored for large scale AI workloads. A second movement is the rise of Processing-in-Memory (PIM) technologies, with Samsung and SK hynix advancing memory centric approaches that reduce data movement and improve energy efficiency. Together, these trends signal a transition from a GPU dominated ecosystem to heterogeneous AI hardware that spans GPU, NPU, and PIM.
This diversification creates both opportunities and fundamental systems challenges. Each class of accelerator offers distinct strengths, and no single device can efficiently support the full range of generative AI inference, whose operational patterns vary across algorithmic components as well as diverse application behaviors, including interactive and agent driven workloads. Leveraging the complementary strengths of GPU, NPU, and PIM is therefore essential for improving performance, energy efficiency, and overall cost effectiveness at cloud scale. Achieving this requires more than isolated hardware advances. It demands hardware-software co-design that enables heterogeneous accelerators to operate as a unified computing substrate rather than as disconnected components.
KAIST’s research addresses this challenge by advancing both hardware level integration strategies and system software that integrates and orchestrates heterogeneous accelerators. In terms of hardware, KAIST has developed NeuPIMs, a NPU-PIM integrated accelerator architecture in which NPU like compute engines and PIM channels collaborate to process different parts of large language model workloads (Figure 1). NeuPIMs introduces mechanisms for distributing compute and memory bound operations across devices and coordinating execution, which allows NPU engines and PIM units operate concurrently. This architecture demonstrates a significant step forward in showing how compute centric and memory centric accelerators can be integrated into a single system to overcome bottlenecks inherent in GPU only designs.
Figure 1 Overview of the NeuPIMs system and accelerator architecture.
Complementing this hardware contribution, KAIST has developed LLMServingSim, a system level simulation infrastructure that models how GPU, NPU, and PIM devices behave when deployed together inside real serving clusters. The simulator converts model execution graphs into detailed operator schedules, models request level behaviors such as batching and KV cache management, and provides performance estimates without requiring physical hardware deployment (Figure 2). This enables researchers and industry developers to analyze the impact of emerging accelerator designs before fabrication, guiding chip architecture choices, runtime scheduling policies, and cluster level provisioning.
Figure 2 Overview of the LLMServingSim simulation framework.
Soft Airless Wheel for A Lunar Exploration Rover Inspired by Origami and Da Vinci Bridge Principles
Read moreWearable Haptics of Orthotropic Actuation for 3D Spatial Perception in Low-visibility Environment
Read moreLighting the Lunar Night: KAIST Develops First Electrostatic Power Generator for the Moon
Read moreHow AI Thinks: Understanding Visual Concept Formations in Deep Learning Models
Read moreTwinSpin: A Novel VR Controller Enabling In-Hand Rotation
Read more