Quote:
Originally Posted by headkaze
Thanks for the stats warmi it's good to know I'm heading in the right direction too
BTW I assume you're using matrix transforms like those in the link I posted?
.
|
I am using custom VFP/Neon asm code which is actually faster than the GPU itself ( about 3-4 times faster than optimized C code) so for me transforming vertices/normals on the CPU is not a problem at all.
The biggest FPS killer are the draw calls and the internal driver vertex processing code which I can do nothing about (well, almost nothing, one way to minimize that is to submit your positions/normals/uvs as shorts and have them rescaled back on the GPU - this way a typical vertex structure which takes 32 bytes ( 3 floats/position, 3 floats/normal, 2 floats/uvs) can be shortened to 20 bytes (4 shorts/position , 4 shorts/normal , 2 shorts/uvs)