I spoke recently with Ben de Waal, NVIDIA’s Vice President of GPU software, and he revealed that NVIDIA has plans to produce multithreaded ForceWare graphics drivers for its GeForce graphics products. Multithreading in the video driver should allow performance increases when running 3D games and applications on dual-core CPUs and multiprocessor PCs. De Waal estimated that dual-core processors could see performance boosts somewhere between five and 30% with these drivers.
Most imminent on the horizon right now is ForceWare release 75, which will bring a number of improvements for SLI performance and 64-bit Windows, among other things, but release 75 will not be multithreaded. The next major iteration of the driver, release 80, is slated to bring support for multiple threads. We may not see this version for a few months; NVIDIA hasn’t given an exact timetable for the completion of release 80.
Out of curiosity, I asked de Waal why NVIDIA’s drivers don’t already take advantage of a second CPU. After all, the driver is a separate task from the application calling it, and Hyper-Threaded and SMP systems are rather common. He explained that drivers in Windows normally run synchronously with the applications making API calls, so that they must return an answer before the API call is complete. On top of that, Windows drivers run in kernel mode, so the OS isn’t particularly amenable to multithreaded drivers. NVIDIA has apparently been working on multithreaded drivers for some time now, and they’ve found a way to fudge around the OS limitations.
De Waal cited several opportunities for driver performance gains with multithreading. Among them: vertex processing. He noted that NVIDIA’s drivers currently do load balancing for vertex processing, offloading some work to the CPU when the GPU is busy. This sort of vertex processing load could be spun off into a separate thread and processed in parallel.
Some of the driver’s other functions don’t lend themselves so readily to parallel threading, so NVIDIA will use a combination of fully parallel threads and linear pipelining. We’ve seen the benefits of linear pipelining in our LAME audio encoding tests; this technique uses a simple buffering scheme to split work between two threads without creating the synchronization headaches of more parallel threading techniques.
Despite the apparent gains offered by multithreading, de Waal expressed some skepticism about the prospects for thread-level parallelism for CPUs. He was concerned that multithreaded games could blunt the impact of multithreaded graphics drivers, among other things.