Blog

Breaking the Memory Wall In AI

In the long unfolding arc of technology innovation, artificial intelligence (AI) looms as immense. In its quest to mimic human behavior, the technology touches energy, agriculture, manufacturing, logistics, healthcare, construction, transportation and nearly every other imaginable industry – a defining role that promises to fast track the fourth Industrial Revolution. And if the industry oracles have it right, AI growth will be nothing shy of explosive.

Internet of Things (IoT) and 4G/5G, both key AI enablers, are expected to account for more than 75 percent of device connections by 2025. In order to process the colossal amount of data central to the promise of AI, from over 30 billion connected devices worldwide, the industry needs to now break through the limits of a key technology: memory.

Memory As A Critical AI Bottleneck

The challenge for memory begins with performance. Historically, every decade, the gains in computer performance have outpaced improvements in memory speed by 100 times, and over the past 20 years that gap has grown. The result of this trend is that memory has bottlenecked computer and AI performance.

The industry has responded by developing several methods to implement memory systems on AI chips, with each being specialized for different performance needs and each requiring different trade-offs. Among the front runners:

  • On-chip memory delivers the highest bandwidth and power efficiency but is limited in capacity.
  • HBM (High Bandwidth Memory) offers both very high memory bandwidth and density.
  • GDDR offers good tradeoffs between bandwidth, power efficiency, cost and reliability.

Since 2012, AI training capability has grown 300,000 times. This growth outpaced Moore’s law by 25,000 times, doubling every 3.5 months, a blistering pace compared to the 18-month doubling cycle of Moore’s law. The staggering recent improvements have been attributed to parallel computing capacity and new application-specific silicon like Google’s Tensor Processing Unit (TPU).

Specialized silicon architectures and parallel engines are one key to sustaining future gains in compute performance and combating the slowing of Moore’s Law and the end of power scaling. By rethinking the way processors are designed for certain markets, chip makers can develop dedicated hardware capable of operating with between 100 and 1,000 times greater energy efficiency than general purpose processors to overcome another big limiter to scaling compute performance: processing power.

For its part, the memory industry can improve performance by signaling at higher data rates and using stacked architectures like HBM for greater energy efficiency and performance, and by bringing compute closer to the data.

Memory scaling for AI

A key challenge is scaling memory for artificial intelligence (AI) applications. Growing consumer demand for better voice, gesture, facial recognition experiences, and more immersive virtual reality and augmented reality interactions rise in importance. Offering these experiences requires more computing capability from high-performance computing (HPC) to make big data analytics possible, to run the analytics themselves, as well as machine learning processes that generate meaningful insights using AI and machine intelligence.

Emerging machine learning applications include classification and security, medicine, advanced driver assistance, human-aided design, real-time analytics, and industrial automation. With 75 billion IoT-connected devices generating data expected by 2025, there will be no shortage of data to analyze. For example, the wings alone on a new Airbus A380-1000 feature over 10,000 sensors. Mountains of this data are stored in massive data centers on magnetic hard drives, then transferred to DRAM before moving to SRAM within the CPU for the hand off to the compute hardware for analysis.

With stored data growing at an exponential clip, the question is how to make sure all other memory systems are able to handle the flood of data. AMD’s answer is a chiplet architecture featuring eight smaller chips around the edge that drive the compute and a large chip in center that doubles the IO interface and memory capability to double chip bandwidth.

AMD has also moved from a legacy GDDR5 memory chip configuration to HBM (high-bandwidth memory) to bring memory bandwidth closer to the GPU to process AI applications more efficiently. The HBM provides much higher bandwidth while reducing power consumption. Compared to DRAM, AMD’s HBM delivers a much faster data rate and far greater memory density than DRAM to put memory closer to the GPU so AI applications are processed more efficiently.

Over the next decade, look for more performance improvements from multi-chip architectures, innovations in memory technology and integration, aggressive 3D stacking and streamlined system-level interconnects. The industry will also continue to drive performance gains in devices, compute density and power through technology scaling.