Mem / core clock mystery

Compilers, or related, heavy wizardry and deep magic.
jbikker
Posts: 175
Joined: Mon Nov 28, 2011 8:18 am
Contact:

Mem / core clock mystery

Postby jbikker » Fri Dec 23, 2011 4:59 pm

I was measuring some occupancy issues this afternoon, and stumbled upon this weird observation:

For a path tracing algorithm, which I assumed to be heavily mem bound (and all the signs point in this direction), I measured performance in three scenarios:

1. Everything normal: decent clock speeds for mem and core.
2. Core clock as far down as allowed. Mem clock normal.
3. Mem clock as low as possible. Core clock normal.

To my suprise, 1 and 3 produce virtually identical performance figures. 1 and 2 however are very different. The link between core clock speed and overall performance is actually almost linear.

Any ideas how this might be possible?

Dade
Posts: 206
Joined: Fri Dec 02, 2011 8:00 am

Re: Mem / core clock mystery

Postby Dade » Fri Dec 23, 2011 5:39 pm

Are we talking of CPUs (i.e. low mem bandwidth, a lot of cache) , old school GPUs (i.e. high mem bandwidth, no cache) or modern GPUs (i.e. some cache) ?

Anyway the core clock my have an influence on the cache speed (in the case there is one) :idea: :?:

jbikker
Posts: 175
Joined: Mon Nov 28, 2011 8:18 am
Contact:

Re: Mem / core clock mystery

Postby jbikker » Fri Dec 23, 2011 5:47 pm

Ow sorry, completely forgot to mention. It's a GPU, Fermi, so it has cache. That could indeed be the problem. But I don't see how a cache could hide the mem underclock almost completely, in the context of path tracing?

jbarcz1
Posts: 6
Joined: Sun Dec 11, 2011 11:47 pm

Re: Mem / core clock mystery

Postby jbarcz1 » Sat Dec 24, 2011 5:44 pm

jbikker wrote:Ow sorry, completely forgot to mention. It's a GPU, Fermi, so it has cache. That could indeed be the problem. But I don't see how a cache could hide the mem underclock almost completely, in the context of path tracing?


So, assuming your tools aren't lying about the memory clock, you must be getting terrific latency hiding...

I could see this happening if you have really good locality between parents and their children. Each memory access might yank in a few nodes at a time, and several nodes' worth of compute could be enough to keep that thread running while another one is fetching. Plus, the upper levels of the tree are basically free, since they're accessed all the time, so a miss at the bottom can be offset by another thread starting at the top.

How big's the scene? And do you have a sense of what your occupancy levels are?

toxie
Posts: 118
Joined: Mon Nov 28, 2011 12:30 pm
Location: germany
Contact:

Re: Mem / core clock mystery

Postby toxie » Mon Jan 09, 2012 3:47 pm

Do you have a screenshot of the measured scene? Otherwise its difficult to tell..
Better you leave here with your head still full of kitty cats and puppy dogs.


Return to “Considered Harmfull”

Who is online

Users browsing this forum: No registered users and 1 guest