Currently, the topic of video memory (VRAM) capacity in mid-range graphics processing units (GPUs) is sparking intense debate. Many users and enthusiasts are asking: why do manufacturers, especially NVIDIA, limit the amount of memory on their mass-market solutions? Beyond the obvious pursuit of profit, the fundamental GPU architecture plays a key role.


The Interplay of GPU Architecture and Memory Controllers

Let's start with the basics. Flagship GPUs, such as the NVIDIA GB202 (which forms the basis of the RTX 5090), are complex dies that include a vast number of computational clusters, CUDA cores, and, critically, GDDR7 memory controllers.

The core of the problem lies in the fact that each computational block within the GPU architecture is proportionally linked to its own memory controller. This means that when a GPU is scaled down, with simplified versions cut from larger and more powerful dies (to create mid-range and lower-end cards), the number of computational cores inevitably decreases, and along with them — the number of associated memory controllers. Consequently, the available video memory capacity is reduced.

This approach allows you to create more energy-efficient solutions that are well suited not only for gaming systems, but also for office PCs, where the performance of the graphics card is not critical, but stability and optimal price are important.


The RTX 5060 Example: When 12GB is a Problem

Consider the GB206 chip, which will form the core of the RTX 5060 and its Ti version. This die features only three computational clusters and, correspondingly, just four GDDR7 memory controllers, resulting in a 128-bit bus.

Currently, new 2GB video memory chips are widely available in mass production. With a 128-bit bus and 2GB chips, two main configurations are possible:

  • One chip per channel: A total memory of 8GB.
  • Two chips per channel: A total memory of 16GB.

The question arises: why isn't there a 12GB version, which would seem to be the "sweet spot" for the mid-range segment? To achieve 12GB with current 2GB chips, one memory controller would need to be disabled, reducing the bus to 96 bits. Such a narrow bus (96 bits) is considered unacceptably low for modern games and applications, as it significantly limits memory bandwidth and, consequently, the card's performance.

Another way to achieve 12GB would be to wait for the emergence of higher-capacity video memory chips. Indeed, some top-tier laptops with the RTX 5090 already use 3GB GDDR7 chips. If such chips were to become widely available, 12GB or even 24GB versions of the 5060 could be produced without altering the 128-bit bus. However, since 3GB chips are currently scarce and the RTX 5060 is aimed at the mass market, NVIDIA decided to release it with 8GB of memory, reserving larger VRAM capacities for future "Super" versions.


The RTX 5070 Example: Why Exactly 12GB?

Moving on to the RTX 5070, which will use the GB205 chip. This GPU features five computational clusters and six memory controllers, providing a 192-bit bus.

With a 192-bit bus and current 2GB GDDR7 chips, only two memory sizes are possible:

  • 12GB
  • 24GB

NVIDIA released the RTX 5070 with 12GB. This is because assigning a flagship memory volume (24GB) to a card that is in the pre-top tier, rather than the highest segment, would be illogical from a product positioning standpoint. Nevertheless, there's also potential for a future "Super" version here: installing 3GB chips could increase the video memory to a more suitable 18GB.


Conclusion

Thus, the limitation of video memory capacity in mid-range graphics cards is not solely a result of a desire to save costs. It's a complex problem rooted in GPU architectural specifics, the direct correlation between the number of computational blocks and memory controllers, as well as the current availability and economic feasibility of video memory chips of various capacities. Manufacturers are forced to balance performance, production cost, and product positioning in the market.