Quote:
Originally Posted by warmi
I am using custom VFP/Neon asm code which is actually faster than the GPU itself ( about 3-4 times faster than optimized C code) so for me transforming vertices/normals on the CPU is not a problem at all.
The biggest FPS killer are the draw calls and the internal driver vertex processing code which I can do nothing about (well, almost nothing, one way to minimize that is to submit your positions/normals/uvs as shorts and have them rescaled back on the GPU - this way a typical vertex structure which takes 32 bytes ( 3 floats/position, 3 floats/normal, 2 floats/uvs) can be shortened to 20 bytes (4 shorts/position , 4 shorts/normal , 2 shorts/uvs)
|
Interesting. I had read somewhere that multiple draw calls did not seem to affect performance but this definitely points to the contrary.
I see you've ignored the suggestion about aligning your vertex structure row length to a multiple of 8 bytes. Wondering what your results would be like if you manage to somehow cut those 20 bytes down to 16 (or better yet, pad with 4 to make it 24).