Advertise Books Events Forum News Social Networking Support Us
Follow @iphonedevsdk on Twitter

sdkIQ for iPhone
($4.99)

Your First iPhone App
($1.99)

iPhone Code Generator
($9.99)

Dual Matches
($0.99)

Calcuccino Programmers' Calculator
($2.99)

SDKtoday
(free)

Want your application or service advertised on iPhone Dev SDK?

Go Back   iPhone Dev SDK Forum > iPhone SDK Development Forums > iPhone SDK Game Development

Reply
 
LinkBack Thread Tools Display Modes
Old 02-28-2010, 06:55 PM   #1 (permalink)
Registered Member
 
Join Date: Feb 2010
Posts: 3
Default Triangles or Triangle Strips?

I have a series of low poly models (~20 triangles each, 100 of them on the screen at a time). They are not interconnected, so drawing them all at one time with GL_TRIANGLE_STRIP is not an option.

My two options as I see it are to either put all 100 models' vertex, normal, and color data into a big interleaved array and draw them using one DrawElements() call using GL_TRIANGLES. Alternatively, I can construct a strip-ordered interleaved array to contain the data for one model, and use this array to draw each model over and over again in a for loop by calling DrawElements() 100 times using GL_TRIANGLE_STRIP.

Which one is faster on the iPhone?

Last edited by demione; 02-28-2010 at 07:28 PM.
demione is offline   Reply With Quote
Old 03-01-2010, 05:01 AM   #2 (permalink)
Registered Member
 
Join Date: Feb 2010
Posts: 98
Default

Do each of the models need to be able to rotate and translate on their own? If so how do you intend on rendering them in a batch? The only way I'm aware this is possible is to use matrix transforms on the vertex data and that could be slow applying to that many vertices. If you use glRotate, glTranslate etc. you obviously can't do them all in a batch.

Also are you enabling GL_BLEND? This seems to be a big performance hit, if your models don't have alpha blending then disable it and make sure your model textures are 24 bit.

Are you using PVR texture compression? Are all your textures in one big texture atlas?

GL_TRIANGLE_STRIP is faster than GL_TRIANGLES but I don't see much of a performance hit for what you're doing. iPhone's main problem is alpha blending and fill rate. Of course the less texture and state changes the better.

The best way to find out where your bottle necks are is to profile your code. A simple way to do this is the following

Code:
NSDate *start, *stop;
NSTimeInterval duration;
	
start = [NSDate date];

// Do stuff

stop = [NSDate date];
duration = [stop timeIntervalSinceDate:start];
NSLog(@"Stuff: %f", duration);
There are also some great performance analysers in XCode (Run->Run with Performance Tool).

Last edited by headkaze; 03-01-2010 at 05:05 AM.
headkaze is offline   Reply With Quote
Old 03-01-2010, 07:28 AM   #3 (permalink)
Registered Member
 
Join Date: Nov 2008
Posts: 195
Default

Quote:
Originally Posted by demione View Post
I have a series of low poly models (~20 triangles each, 100 of them on the screen at a time). They are not interconnected, so drawing them all at one time with GL_TRIANGLE_STRIP is not an option.

My two options as I see it are to either put all 100 models' vertex, normal, and color data into a big interleaved array and draw them using one DrawElements() call using GL_TRIANGLES. Alternatively, I can construct a strip-ordered interleaved array to contain the data for one model, and use this array to draw each model over and over again in a for loop by calling DrawElements() 100 times using GL_TRIANGLE_STRIP.

Which one is faster on the iPhone?
Yeah, the only sane way of handling this setup is to transform vertices on the CPU and submit them with a single interleaved array.
warmi is offline   Reply With Quote
Old 03-01-2010, 07:34 AM   #4 (permalink)
Registered Member
 
Join Date: Feb 2010
Posts: 98
Default

Quote:
Originally Posted by warmi View Post
Yeah, the only sane way of handling this setup is to transform vertices on the CPU and submit them with a single interleaved array.
Actually that's an interesting point, is it that important to have the array interleaved? Currently for my model animation system I am setting the vertex pointer for each anim frame rather than copying over the data which is obviously quicker. This means my vertex, tex coords etc. are all separate pointers to the data. But I did stumble upon an article which said to keep your vertex data in a single struct for better performance. I take it that's what you mean by interleaved? But with the iPhone sharing it's RAM and not having to transfer data to any sort of "VRAM" would it really incur a penalty?

EDIT: Found the article Interleaving Vertex Data which is where this is stated. Also for the OP who may want to look at transforming vertex data using matrices, another good article from the same site Transformations and Matricies

Last edited by headkaze; 03-01-2010 at 07:53 AM.
headkaze is offline   Reply With Quote
Old 03-01-2010, 09:31 AM   #5 (permalink)
Registered Member
 
Join Date: Nov 2008
Posts: 195
Default

Quote:
Originally Posted by headkaze View Post
Actually that's an interesting point, is it that important to have the array interleaved? Currently for my model animation system I am setting the vertex pointer for each anim frame rather than copying over the data which is obviously quicker. This means my vertex, tex coords etc. are all separate pointers to the data. But I did stumble upon an article which said to keep your vertex data in a single struct for better performance. I take it that's what you mean by interleaved? But with the iPhone sharing it's RAM and not having to transfer data to any sort of "VRAM" would it really incur a penalty?

EDIT: Found the article Interleaving Vertex Data which is where this is stated. Also for the OP who may want to look at transforming vertex data using matrices, another good article from the same site Transformations and Matricies
It all about spatial locality - if you are accessing positions , there is a good chance you will also access normals ( if you are doing CPU transformations) and having normals reside in some other part of memory means having potential two cache line misses as opposed to just one.
Beside your own code , the GLES driver on the iPhone walks your entire vertex stream and does some preprocessing on its own so you do want to use interleaved arrays.

PS. I am not sure I understand your point about animations and having separate vertex streams.
warmi is offline   Reply With Quote
Old 03-01-2010, 09:51 AM   #6 (permalink)
Registered Member
 
Join Date: Feb 2010
Posts: 3
Default

Quote:
Originally Posted by headkaze View Post
Do each of the models need to be able to rotate and translate on their own? If so how do you intend on rendering them in a batch?
No texture, no lighting. I will have to perform rotation and scaling operations on them. I realize I could write my own operations to do this logic on the CPU, as a way to avoid having to call DrawElements 100 times. Just trying to get a sense of whether or not it's worth the effort.
demione is offline   Reply With Quote
Old 03-01-2010, 11:28 AM   #7 (permalink)
Registered Member
 
Join Date: Nov 2008
Posts: 195
Default

Quote:
Originally Posted by demione View Post
No texture, no lighting. I will have to perform rotation and scaling operations on them. I realize I could write my own operations to do this logic on the CPU, as a way to avoid having to call DrawElements 100 times. Just trying to get a sense of whether or not it's worth the effort.
It makes a lot of difference..

I actually did some tests and even have pictures to prove it :-)

Here is a run without batching ( you can see it issues 121 DrawElements calls "renderables") - it runs at around 22 fps in release mode ( this screenshot is from my debug session)
http://www.warmi.net/tmp/Screenshot_1a.png

And here is another run with batching (transforming positions/normal on the CPU - you can see it only has 4 DrawElements "renderables") - it runs at 55+ fps ( again the screenshot is from debug session )
http://www.warmi.net/tmp/Screenshot_1b.png

So the difference is huge.
warmi is offline   Reply With Quote
Old 03-01-2010, 11:55 AM   #8 (permalink)
Registered Member
 
Join Date: Feb 2010
Posts: 98
Default

Quote:
Originally Posted by warmi View Post
And here is another run with batching (transforming positions/normal on the CPU - you can see it only has 4 DrawElements "renderables")
Thanks for the stats warmi it's good to know I'm heading in the right direction too

BTW I assume you're using matrix transforms like those in the link I posted?

Quote:
PS. I am not sure I understand your point about animations and having separate vertex streams.
The problem is my models can change texture coordinates so they can use different texture images inside a texture atlas. It's too slow for me to loop through the tex coords in all the frames of vertex arrays to set them all, so I have a separate array for them. Unfortunately this means I can't use interleaving for the whole vertex.
headkaze is offline   Reply With Quote
Old 03-01-2010, 12:08 PM   #9 (permalink)
Registered Member
 
Join Date: Nov 2008
Posts: 195
Default

Quote:
Originally Posted by headkaze View Post
Thanks for the stats warmi it's good to know I'm heading in the right direction too

BTW I assume you're using matrix transforms like those in the link I posted?

.
I am using custom VFP/Neon asm code which is actually faster than the GPU itself ( about 3-4 times faster than optimized C code) so for me transforming vertices/normals on the CPU is not a problem at all.

The biggest FPS killer are the draw calls and the internal driver vertex processing code which I can do nothing about (well, almost nothing, one way to minimize that is to submit your positions/normals/uvs as shorts and have them rescaled back on the GPU - this way a typical vertex structure which takes 32 bytes ( 3 floats/position, 3 floats/normal, 2 floats/uvs) can be shortened to 20 bytes (4 shorts/position , 4 shorts/normal , 2 shorts/uvs)
warmi is offline   Reply With Quote
Old 03-01-2010, 02:27 PM   #10 (permalink)
Registered Member
 
Join Date: Feb 2010
Posts: 3
Default

Quote:
Originally Posted by warmi View Post
I am using custom VFP/Neon asm code which is actually faster than the GPU itself ( about 3-4 times faster than optimized C code) so for me transforming vertices/normals on the CPU is not a problem at all.

The biggest FPS killer are the draw calls and the internal driver vertex processing code which I can do nothing about (well, almost nothing, one way to minimize that is to submit your positions/normals/uvs as shorts and have them rescaled back on the GPU - this way a typical vertex structure which takes 32 bytes ( 3 floats/position, 3 floats/normal, 2 floats/uvs) can be shortened to 20 bytes (4 shorts/position , 4 shorts/normal , 2 shorts/uvs)
Interesting. I had read somewhere that multiple draw calls did not seem to affect performance but this definitely points to the contrary.

I see you've ignored the suggestion about aligning your vertex structure row length to a multiple of 8 bytes. Wondering what your results would be like if you manage to somehow cut those 20 bytes down to 16 (or better yet, pad with 4 to make it 24).
demione is offline   Reply With Quote
Old 03-01-2010, 02:36 PM   #11 (permalink)
Registered Member
 
Join Date: Nov 2008
Posts: 195
Default

Quote:
Originally Posted by demione View Post
Interesting. I had read somewhere that multiple draw calls did not seem to affect performance but this definitely points to the contrary.

I see you've ignored the suggestion about aligning your vertex structure row length to a multiple of 8 bytes. Wondering what your results would be like if you manage to somehow cut those 20 bytes down to 16 (or better yet, pad with 4 to make it 24).
Why multiple of 8s ? 4s is plenty since that's what ARM operates with - if you really want to be cache friendly you would have to align your entire vertex struct to be 32 bytes exact.

As far as multiple draw calls - of course it does ... even if you are just doing simply glDrawArrays in a loop it still makes a lot of difference but if you take into account the fact that a typical engine ( as opposed to some lean demo code) will attempt to do a lot more then just call glDrawArrays for each renderable batch - it becomes even more important.
warmi is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


» Advertisements
» Stats
Members: 41,862
Threads: 49,770
Posts: 213,057
Top Poster: BrianSlick (3,139)
Welcome to our newest member, futurevilla216
Powered by vBadvanced CMPS v3.1.0

All times are GMT -5. The time now is 07:05 PM.
Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0