Im not speaking on terms of just general folding either. Each folding client in itself supports different things because of different coding like you said. For instance, Fermis desktop variants have a low Douple Precision performance value because they are game oriented. Meaning that Milkyway@home yields faster PPD for ATI cards like a 5870, and my Fermi doesnt even have an API for it yet. They said simply because Fermi is to slow for them to care. Yet on the other hand you have Standfords F@H which utilizes the Fermi chips to their full extent and doesnt utilize ATI. See what im saying? I understand what your saying on GPGPU honestly. My main point of the matter is though, that standard consumers like you and i wont be looking at SPUs performance for the sole purpose of GPGPU. Its all about gaming performance on desktop variants and so therefor companies like Nvidia and AMD put their GPGPU raw performance on their HPC variants. In other words to them people who buy a desktop video card for GPGPU performance is a minority in their eyes, and Nvidia openly admited that in one of their online Live chats on CUDA.
As for the memory requirement, thats why HPC cards have large amounts of ram