Nvidia has long promoted its PhysX game physics middleware as an example of a computing problem that benefits greatly from GPU acceleration, and a number of games over the past couple of years have featured PhysX with GPU acceleration. Those games have often included extra physics effects that, when enabled without the benefit of GPU acceleration, slow frame rates to a crawl. With the help of an Nvidia GPU, though, those effects can usually be produced at fluid frame rates.
We have noted in the past that some games implement PhysX using only a single thread, leaving additional cores and hardware threads on today’s fastest CPUs sitting idle. That’s true despite the fact that physics solvers are inherently parallel and are highly multithreaded by nature when executing on a GPU.
Now, David Kanter at RealWorld Technologies has added a new twist to the story by analyzing the execution of several PhysX games using Intel’s VTune profiling tool. Kanter discovered that when GPU acceleration is disabled and PhysX calculations are being handled by the CPU, the vast majority of the code being executed uses x87 floating-point math instructions rather than SSE. Here’s Kanter’s summation of the problem with that fact:
x87 has been deprecated for many years now, with Intel and AMD recommending the much faster SSE instructions for the last 5 years. On modern CPUs, code using SSE instructions can easily run 1.5-2X faster than similar code using x87. By using x87, PhysX diminishes the performance of CPUs, calling into question the real benefits of PhysX on a GPU.
Kanter notes that there’s no technical reason not to use SSE on the PC—no need for additional mathematical precision, no justifiable requirement for x87 backward compatibility among remotely modern CPUs, no apparent technical barrier whatsoever. In fact, as he points out, Nvidia has PhysX layers that run on game consoles using the PowerPC’s AltiVec instructions, which are very similar to SSE. Kanter even expects using SSE would ease development: "In the case of PhysX on the CPU, there are no significant extra costs (and frankly supporting SSE is easier than x87 anyway)."
So even single-threaded PhysX code could be roughly twice as fast as it is with very little extra effort.
Between the lack of multithreading and the predominance of x87 instructions, the PC version of Nvidia’s PhysX middleware would seem to be, at best, extremely poorly optimized, and at worst, made slow through willful neglect. Nvidia, of course, is free to engage in such neglect, but there are consequences to be paid for doing so. Here’s how Kanter sums it up:
The bottom line is that Nvidia is free to hobble PhysX on the CPU by using single threaded x87 code if they wish. That choice, however, does not benefit developers or consumers though, and casts substantial doubts on the purported performance advantages of running PhysX on a GPU, rather than a CPU.
Indeed. The PhysX logo is intended as a selling point for games taking full advantage of Nvidia hardware, but it now may take on a stronger meaning: intentionally slow on everything else.