You never write a program to use cache, the hardware does it by itself. Therefore, most of those points are null.
As for the increased latency, I could see that on the velocity engine, as that was added on at the end of the 970s production stage. However, increased latency to RAM makes no sense, as there is no L3 cache, which should shorten the time looking through cache, and the frontside bus and memory itself is so much faster.
As with any CPU, specific optimizations are needed if you want to get the most out of every clock cycle. However, with updated compilers, most programmers should just have to run their code through a new compiler, with the correct flags set, and they will get 90% of the optimizations for the G5. The remaining 10% would take hours of work, and in most cases, not be worth it.
Even without recompiles of the program, the G5s will still be wicked fast on all of the programs that run on today's G3 and G4 computers.