Memless RT

Must read and other references.
toxie
Posts: 118
Joined: Mon Nov 28, 2011 12:30 pm
Location: germany
Contact:

Memless RT

Post by toxie » Tue Apr 24, 2012 2:39 pm

neat AVX optimizations done to the memless RT stuff and more:
http://voxelium.wordpress.com/2012/04/2 ... tructures/
Better you leave here with your head still full of kitty cats and puppy dogs.

graphicsMan
Posts: 164
Joined: Mon Nov 28, 2011 7:28 pm

Re: Memless RT

Post by graphicsMan » Tue Apr 24, 2012 9:46 pm

Neat. What's cool is that this paradigm is pretty new... maybe within a few years it can be made competitive with more traditional approaches. Thanks for sharing.

davepermen
Posts: 48
Joined: Fri Dec 02, 2011 12:21 pm

Re: Memless RT

Post by davepermen » Wed Apr 25, 2012 6:47 pm

I really like that approach (always like to be 100% dynamic on the fly). Hope they'll soon find a proper solution to scale with multiple cores. that's quite limiting right now. then again, core count seems to stagnate currently. still, they need to be fed :)

once that's found, it'll be great.

ingenious
Posts: 282
Joined: Mon Nov 28, 2011 11:11 pm
Location: London, UK
Contact:

Re: Memless RT

Post by ingenious » Wed Apr 25, 2012 7:39 pm

Don't forget also that this doesn't pay off if you need to trace many rays through the scene. Also, requires to trace rays in batches, so can pose some restrictions on the system, like integrator interruption and resuming, etc.
Image Click here. You'll thank me later.

stefan
Posts: 47
Joined: Wed Dec 21, 2011 8:57 pm

Re: Memless RT

Post by stefan » Wed Apr 25, 2012 9:17 pm

Is it just me or is this (and Rayes) somehow moving towards stochastic rasterization, except in world space as opposed to camera space? Where REYES is sorting primitives into screen space buckets before doing 2D point-in-triangle tests for each pixel, this, if I understand it correctly, is sorting primitives into 3D bounding boxes before doing 3D ray/triangle tests. Together with micro polygon ray tracing, I can see this all moving towards a generalized method in the middle.

Anyhow, I wonder if this would allow for some nice on-demand tessellations.

toxie
Posts: 118
Joined: Mon Nov 28, 2011 12:30 pm
Location: germany
Contact:

Re: Memless RT

Post by toxie » Thu Apr 26, 2012 9:54 am

unfortunately the original thread about the memless RT idea vanished together with the old ompf, but on-demand tesselation is perfectly possible, IMHO even automatic LOD for incoherent rays, if one is willing to make some compromises..
(and in case somebody missed the older poster/publication on the topic: http://ainc.de/Research/MemlessRT.pdf)
Better you leave here with your head still full of kitty cats and puppy dogs.

DTRendering
Posts: 3
Joined: Thu Apr 26, 2012 2:54 pm

Re: Memless RT

Post by DTRendering » Thu Apr 26, 2012 3:34 pm

davepermen wrote:I really like that approach (always like to be 100% dynamic on the fly). Hope they'll soon find a proper solution to scale with multiple cores. that's quite limiting right now. then again, core count seems to stagnate currently. still, they need to be fed :)
once that's found, it'll be great.
Well, I am not sure that there is a real scaling problem. You may want to have a look at the original paper (http://dl.acm.org/citation.cfm?id=2019636) to get more details on the algorithm, results, and the scaling according to cores multicore (~3.5x with 4 threads ).
This TOG paper will be presented at SIGGRAPH 2012, so stop by if you are around and want to know more.
The EG paper has some interest, but by swapping rays instead of just swapping indices, the authors end up with poor scaling due to some bandwidth constraints.
Don't forget also that this doesn't pay off if you need to trace many rays through the scene. Also, requires to trace rays in batches, so can pose some restrictions on the system, like integrator interruption and resuming, etc.
Yep, large batches are needed so it depends on your circumstances. I am not sure why "this doesn't pay off if you need to trace many rays through the scene" though. ;-)
(!advertisement! ;-)) you can get your own opinion by using the library available at www.directtrace.org .Some kind of update to the lib is now overdue though. If you want a quick overview of the programming paradigm for the lib, there is also the HPG'11 poster available at:
http://www.highperformancegraphics.org/ ... stract.pdf
maybe within a few years it can be made competitive with more traditional approaches.
It depends what we mean by competitive. I am pretty sure that obtaining several millions of purely random rays per second (not ambient occlusion rays for instance) is quite competitive, and do not forget that a prior construction step is not needed. Maybe I am wrong, but I would expect that in many cases you can get 75% to 95% of the performances of a state-of-the-art ray-tracer with such an approach (again, I refer you to the TOG paper results).

Finally, what Toxie said previously is right, things like tessellation is fun and possibly easier with such a paradigm.

ingenious
Posts: 282
Joined: Mon Nov 28, 2011 11:11 pm
Location: London, UK
Contact:

Re: Memless RT

Post by ingenious » Fri Apr 27, 2012 9:30 am

DTRendering wrote:
Don't forget also that this doesn't pay off if you need to trace many rays through the scene. Also, requires to trace rays in batches, so can pose some restrictions on the system, like integrator interruption and resuming, etc.
Yep, large batches are needed so it depends on your circumstances. I am not sure why "this doesn't pay off if you need to trace many rays through the scene" though. ;-)
Well, because every batch intersection operation effectively builds a new acceleration structure, which is then thrown away. For multi-bounce global illumination you need to perform many such iterations. Actually, you could cache the built structure and maybe even refine it over iterations. That's an interesting direction to investigate. But with caching the parallelization issue should become worse.
Image Click here. You'll thank me later.

DTRendering
Posts: 3
Joined: Thu Apr 26, 2012 2:54 pm

Re: Memless RT

Post by DTRendering » Fri Apr 27, 2012 10:56 am

because every batch intersection operation effectively builds a new acceleration structure
I would say that it actually builds parts of a new acceleration structure, with subtle differences as well (I'll emphasize that in the SIGGRAPH talk). This is quite important as the building percentage is not so high. So it may require several batches (2 or 4 or 10 or 100?) before precomputing a spatial subdivision data structure is a real advantage. I would estimate it between 4 and 8 batches, but it really depends on the circumstances.
The algorithm itself can also solve problems in other areas. For instance, if you are using ray-tracing to do collision detections only, then only one batch is needed.
The EG short paper compares their results with precomputed data-structures, but did not discuss the construction times of their data-structures either. It could be interesting if the authors could post informal results here.

Finally, you do not throw away your results. At the end of your batch processing, you end up with a list of shuffled triangles, and clearly there is some sort of coherence in the order of appearance :) . Anyone interested in writing a paper on that?
Last edited by DTRendering on Fri Apr 27, 2012 2:12 pm, edited 1 time in total.

voxelium
Posts: 15
Joined: Fri Apr 27, 2012 11:10 am
Contact:

Re: Memless RT

Post by voxelium » Fri Apr 27, 2012 11:40 am

DTRendering wrote:Well, I am not sure that there is a real scaling problem. You may want to have a look at the original paper (http://dl.acm.org/citation.cfm?id=2019636) to get more details on the algorithm, results, and the scaling according to cores multicore (~3.5x with 4 threads ).
Unfortunately, there is definitely a scaling problem for incoherent rays. The scaling strongly depends on the scene and the CPU architecture. For example, I've achieved almost the same scaling as you did for the Conference scene on a very similar CPU (Bloomfield), but it can be worse with Sandy Bridge or other scenes. That's why I did tests using both CPUs.
DTRendering wrote:The EG paper has some interest, but by swapping rays instead of just swapping indices, the authors end up with poor scaling due to some bandwidth constraints.
Swapping indices works well for coherent rays, but performs consistently worse for incoherent rays because of the poor cache utilization.

Post Reply