VCM GPU implementation (+ some extras)

Practical and theoretical implementation discussion.
MohamedSakr
Posts: 83
Joined: Thu Apr 24, 2014 2:27 am

VCM GPU implementation (+ some extras)

Postby MohamedSakr » Sun Jun 01, 2014 11:07 am

I'm trying to implement VCM (from SmallVCM project) to GPU using CUDA, I've read "Progressive Light Transport Survey on the GPU: Survey and Improvements"

I have a few questions:
1- in the light samples loop, should I sort every light bounce iteration? (to avoid divergence within warps), I've tested sorting and it should take around 2.6 ms (1 sort and 1 copy_if) per iteration (assuming we have 10 light bounces, this will lead to 26 ms for the whole light loop in a full HD resolution (2 million light paths)

2- same as question 1, but in the camera loop :)

3- in the above paper, the BDPT algorithm shows that in the camera Paths loop, he is connecting to random Light Vertex Cache (LVC) while it is commented in SmallVCM
// For VC, each light sub-path is assigned to a particular eye
// sub-path, as in traditional BPT. It is also possible to
// connect to vertices from any light path, but MIS should
// be revisited.

how MIS should be revisited "or what should I add to the code in other words"

tomasdavid
Posts: 22
Joined: Wed Oct 10, 2012 12:41 pm

Re: VCM GPU implementation (+ some extras)

Postby tomasdavid » Mon Jun 02, 2014 10:09 am

Yo.

ad 1/2) I am not entirely sure what do you intend to sort here. I don't think I sort anything in either of the two, but it's been a bit since I touched the implementation.

ad 3) The MIS used is derived for connecting to all vertices of a single "companion" path. When you connect to a subset of vertices from a superset of paths, it is possible that the optimal MIS is different. I haven't looked into it too much, but Dietger's thesis/techreport (cannot find either right now, I am sure jacco will be able to point it out) had some interesting pointers regarding the influence on MIS. Bottomline, it is probably not that important.

MohamedSakr
Posts: 83
Joined: Thu Apr 24, 2014 2:27 am

Re: VCM GPU implementation (+ some extras)

Postby MohamedSakr » Mon Jun 02, 2014 11:39 am

about sorting:

in this loop "for(;; ++lightState.mPathLength)" and "for(;; ++cameraState.mPathLength)"

I can sort 2 times per iteration, 1 for BSDF (each BSDF is interacting in a different way, and lots of checks about BSDF.isdelta() or BSDF.isvalid() etc...
and 1 at the end for sample Scattering "so one thread can't wait for others to finish the 10 iterations for example"

after some "theoritical" measurements by CPU, for a full HD resolution, in the sun scene, 1 thread from CPU @ 4GHz takes around 33 seconds to finish 1 full image iteration

let's assume code efficiency of 50% "we don't use the full 8 instructions/cycle"
my processor at 4 GHz 6 cores 12 threads can do 192 GFlops, so 50% will be 96 GFlops (with all threads)
so total flops / iteration for 1 thread =(96 / 12) * 33 = around 264 GFlops for a full HD image iteration

so if we average them out over 10 ray bounces, 1 iteration (for light + camera) will require 26.4 GFlops (if sorted) with 50% divergence, this may lead to 52.8 GFlops (if non-sorted)

to sort or not to sort!! :
sorting with thrust on my GTX 780 on 2 million elements takes 1.7 ms ( or 7.65 GFlops), copy_if in thrust (so it drops terminated rays after each iteration) takes 0.7 ms ( 3.15 GFLops)

so my guess I will make an adaptive algorithm so that the GPU knows when to sort :D

tomasdavid
Posts: 22
Joined: Wed Oct 10, 2012 12:41 pm

Re: VCM GPU implementation (+ some extras)

Postby tomasdavid » Mon Jun 02, 2014 6:59 pm

Ah, for compaction purposes, got it.

I have only fairly simple BRDFs (the most complicated is Ashikhmin-Shirley), so compacting for BRDFs didn't pay of for me.
Also, as stated in the paper, on the newer generation (6xx, and it goes for 7xx as well) the compaction doesn't buy you as much as it did on 5xx, and it leads to more complicated code.

Don't forget that even without compaction you do in-place ray regeneration (as per Novak's paper), so the only differences are:
a) code divergence during the regeneration (the non-regenerated paths don't do anything useful)
b) ray divergence, because the freshly regenerated primary rays are not traced together.

Dietger
Posts: 50
Joined: Tue Nov 29, 2011 10:33 am

Re: VCM GPU implementation (+ some extras)

Postby Dietger » Tue Jun 03, 2014 6:58 pm

tomasdavid wrote:ad 3) The MIS used is derived for connecting to all vertices of a single "companion" path. When you connect to a subset of vertices from a superset of paths, it is possible that the optimal MIS is different. I haven't looked into it too much, but Dietger's thesis/techreport (cannot find either right now, I am sure jacco will be able to point it out) had some interesting pointers regarding the influence on MIS. Bottomline, it is probably not that important.


Shameless self-promotion: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.217.7304

ingenious
Posts: 279
Joined: Mon Nov 28, 2011 11:11 pm
Location: London, UK
Contact:

Re: VCM GPU implementation (+ some extras)

Postby ingenious » Tue Jun 03, 2014 8:29 pm

MohamedSakr wrote:3- in the above paper, the BDPT algorithm shows that in the camera Paths loop, he is connecting to random Light Vertex Cache (LVC) while it is commented in SmallVCM
// For VC, each light sub-path is assigned to a particular eye
// sub-path, as in traditional BPT. It is also possible to
// connect to vertices from any light path, but MIS should
// be revisited.

how MIS should be revisited "or what should I add to the code in other words"


Ordinary BPT connects the vertices of each eye subpath to the vertices of a single light subpath. Therefore, the number of samples each (s,t) technique takes is 1. The numerator in the balance heuristic weight is this equal to the pdf of the constructed path.

Now, say you have sampled N light subpaths and stored their vertices whose number is V. You then trace an eye subpath through every pixel, connecting each eye vertex to C randomly chosen light vertices.

In order to derive the MIS weight for a connection, we can reinterpret the above process as follows. Conceptually, you're connecting each eye vertex to the vertices of all N light subpaths. Thus, the number of samples that each (s,t) technique takes is not 1 anymore, but is N. However, the probability for each connection is now C/V. Therefore, the numerator in the balance heuristic weight needs to be multiplied by N * C/V.

Finally, let's consider the special case where C is set to the average light subpath length, which will make roughly as many connections as ordinary BPT. The average light subpath length is V/N. In this special case everything cancels out in the above multiplier, so the MIS weight remains unchanged. In practice though, C will almost always will be a fractional number, which you can round to the closest integer. So the "proper" weight will be different, although very slightly.

In summary, if you pick the number of connections to be equal to the (integer-rounded) average light subpath length, then you can use the traditional MIS weight.
Image Click here. You'll thank me later.

MohamedSakr
Posts: 83
Joined: Thu Apr 24, 2014 2:27 am

Re: VCM GPU implementation (+ some extras)

Postby MohamedSakr » Tue Jun 03, 2014 9:52 pm

@Dietger really nice thesis :D , what I see "I may be mistaken so correct me" is that you put the whole camera loop inside the light loop, is this safe to do in smallVCM? "as there is photons merging (ppm)", I see the main benefit here is the memory consumption, why it is GPU friendly? "I sense it should give the same performance as the separated light loop and camera loop"

@ingenious thanks for clarification :) , I think I will leave MIS weight as it is for now

MohamedSakr
Posts: 83
Joined: Thu Apr 24, 2014 2:27 am

Re: VCM GPU implementation (+ some extras)

Postby MohamedSakr » Sat Sep 06, 2014 8:56 am

I'm trying to implement VCM in PBRT, I have successfully done so far "vertex to vertex connection" , "camera vertex shadow ray" , "camera vertex hitting a light (BG or Area)"
what I failed in is "light vertex hitting the camera lens" , it gives weird results!! and I tried all what I can do but somehow couldn't figure out what makes the problem

for some reason, the light vertices which are closer to camera appear brighter!!, and the whole image is not balanced
here are some results:(note: this is just the BDPT, vertex merging part is not in the results)
LVtoCameraLens.jpg
the bad light vertex connection to camera lens
LVtoCameraLens.jpg (253.82 KiB) Viewed 5777 times

otherConnections_correct.jpg
the good "vertex to vertex connection" , "camera vertex shadow ray"
otherConnections_correct.jpg (213.91 KiB) Viewed 5777 times

room-photon.jpg
reference image rendered with photon mapping and final gather
room-photon.jpg (419.44 KiB) Viewed 5777 times

MohamedSakr
Posts: 83
Joined: Thu Apr 24, 2014 2:27 am

Re: VCM GPU implementation (+ some extras)

Postby MohamedSakr » Sat Sep 06, 2014 1:31 pm

NVM, I solved it :D , the problem was in my misunderstanding of PBRT image film, I was using film->AddSample(), while the correct is to use film->Splat() for rays which come from light and hits the camera lens, so the weighting should be fine now and results are correct


Return to “General Development”

Who is online

Users browsing this forum: No registered users and 3 guests