Sunday, 26 January 2014

NVIDIA Maxwell SteamOS Machine with up to 16 Denver CPU Cores and 1 Million Draw Calls

To date NVIDIA's upcoming Maxwell GPU architecture has been rumored to be coming with up to 8 NVIDIA custom designed Denver 64-bit ARM CPU cores. Well, a friendly mole from their cloud gaming division has let me know that they are mulling the option of equipping the highest-end Maxwell GPU with 16 Denver cores.

Are NVIDIA insane you ask? Well, it looks like they are insane with core numbers lately, the Tegra K1 mobile SoC being a great example. The Maxwell architecture is presumably so energy efficient that it allows NVIDIA to do such a chip with a TDP of around 225W. No wonder the Maxwell GTX 750 is rumored to need very low power. It's not yet clear if the chip will be manufactured on TSMC's 20nm line or the 16nm FinFET one. The latter would push the chip to 2015.

NVIDIA has been able to design the Denver architecture in such a way that it can be manufactured on the same die and process like their high-end GPU's. They somehow managed to architect Denver so that it can be efficiently manufactured on the same process required by high density GPUs. Presumably the trick is that Denver actually very closely resembles a GPU architecture, but has a very powerful instruction set translation unit.

As rumored, that translation unit has been first developed for NVIDIA's x86 project years ago after they licensed Transmeta technology. But then NVIDIA signed an lucrative agreement with Intel, and part of it was that they wouldn't develop hardware or software for x86. Since NVIDIA is an economic company they kept on developing their CPU architecture and steered the project towards the ARM 64-bit architecture. Actually their instruction set translation unit is very flexible and programmable, and can be relatively easily refurbished for other instruction sets. This is something that NVIDIA learned by developing Icera soft-modem technology for Tegra, and the experience developing the CUDA platform also helped.

Coming back to the integration of Maxwell and Denver. Not only can they be manufactured together on the same high-end GPU process, Denver can also be highly integrated into the Maxwell die. Imagine each Maxwell compute cluster having one Denver CPU core available. NVIDIA is even so proud of their achievement that they will maybe call the combination of a Maxwell and Denver compute cluster the Tesla Compute Core (TCC), and the whole chip could be named the Tesla Processing Unit (TPU) since it can run an operating system like Linux on its own. Of course the chip could also get a new name after some other famous scientist, or a superhero.

Did I mention Linux? Isn't SteamOS based on a Linux distro? It's not hard to guess what's coming now. Yeah, NVIDIA is working on SteamOS consoles, with the most powerful of them packing the 16 Denver cores toting Maxwell beast, the ultimate NVIDIA console SoC. The color scheme of NVIDIA's consoles is pretty much guaranteed to be black and green. The name of the new consoles is not yet decided, I don't think they will be called Shield, or will they?

Not only that, NVIDIA intends their SteamOS consoles to be reference platforms for other manufacturers who want to get into the console business the cheap way, just like the NVIDIA Tegra Note 7 and the NVIDIA Shield.

So just what is the 16 Denver cores toting Maxwell beast capable of? My source told me one number, 1 Million draw calls in DirectX 11 and OpenGL 4.4. Just for reference, AMD claims that their upcoming low-level API Mantle will be able to issue up to 150,000 draw calls. Presumably NVIDIA's new hardware beast will be able to obliterate AMD's Mantle API, and this with no code changes required by game developers as it will all be done in hardware.

You ask yourself what game developer would need so many draw calls? This is the maximum number of draw calls that the 16 Denver cores enable, but they can be used for much more. NVIDIA is working on integrating the Denver CPU cores into their GameWorks code library that game developers can integrate freely into their games. They are porting the library to OpenGL and SteamOS.

So what can NVIDIA's new GameWorks library with Denver support do for game developers? For instance realistic physicalization of the whole game world or parts of it with the flick of a switch, including fluids, gases and particles. Advanced ray tracing algorithms. Advanced AI. Advanced data compression algorithms. Advanced adaptable LOD generation and tessellation. Advanced global illumination. Advanced streaming of assets for open world games. Sound processing on the level of AMD's TrueAudio, but programmable. The list goes on and on.

Maxwell with Denver is also a high-performance computing beast, and not only for Maxwell's double-precision floating-point improvements. And it can run Linux on its own, without Intel and AMD CPUs. So supercomputers like Titan that already includes NVIDIA Tesla GPUs and data centers could run on NVIDIA's hardware alone without relying on competitors' hardware. No wonder that NVIDIA would like to sell these new chips as Tesla Processing Units (TPUs).

It will be very interesting to follow what NVIDIA will unveil this year.


  1. This comment has been removed by the author.

  2. I'd like to know who "my source" is, theoretically the chip may be capable of 1 million draw calls, but if the chip is running under DirectX then it will still be limited to the maximum 9k draw calls that AMD claim that a good developer on a good day would be able to achieve. Regardless of how powerful one side of the see-saw is, you need a perfect unity between the GPU and CPU in the form of an API, that is exactly why AMD developed their own, to overcome this limit of DirectX and have their own perfect API to talk between their GPUs and the CPU.

    Also it's worth pointing out that the 100,000 draw calls are not the maximum limit, from what I remember from the presentation I watched a month or two ago, AMD plan to increase this number by orders in magnitude up and over 1 million draw calls in the coming years, most likely down to hardware improvements than changes to the API itself. So unless Nvidia have quickly took their NVAPI and made radical changes to it to meet the level of Mantle then it's just a theory.

    1. Source is your fucking mom! Nobody gives a fuck, go copy paste your crap elsewhere.

    2. lol it's quite sad if you believe this crap, even if this chip was real the 1 million draw calls is completely made up.

  3. When you're tired, you want to relax after a stressful working hours, you need to have time to take care of the kids active.
    Please visit our website and play exciting flash games.
    Thanks you for sharing!
    Friv 4