Discussion on GPU ray tracing

Practical and theoretical implementation discussion.
Post Reply
shiqiu1105
Posts: 138
Joined: Sun May 27, 2012 4:42 pm

Discussion on GPU ray tracing

Post by shiqiu1105 » Thu Jul 19, 2012 3:47 pm

HI folks,

I am trying to write a basic ray tracer with CUDA.

What I have implemented now is simply 1 sample per pixel, and each sample is assigned to a cuda thread.
And each thread traces it's own ray.

I am writing to ask more advanced and efficient ways of doing this.
For example, what's the best the strategy for parallelizing all the tasks? When Multiple samples are used, should I assign each thread all samples in one pixel, or should I only parallelize computation for samples within a pixel and sequentially render all pixels?

And, I have also heard that it's better to trace ray in a breadth-first mannar? Why? any tutorial of how to do this?

Anyway, I will appreciate any advice and idea, thank you~

graphicsMan
Posts: 164
Joined: Mon Nov 28, 2011 7:28 pm

Re: Discussion on GPU ray tracing

Post by graphicsMan » Thu Jul 19, 2012 4:39 pm

The more coherency the better. So if you trace multiple samples for one pixel in a warp, you'll have better utilization of the hardware.

This is also the theme for breadth-first ray tracing. If you trace in batches, then "sort" your rays so they are coherent, you may get better throughput.

However, I think if you have only simple delta reflections, and you don't try to do global illumination, it will be simple and effective to use depth-first ray tracing, and it will be way easier to write the code.

There's also a question on new hardware whether you gain enough performance by the sorting + breadth first ray tracing that it actually is faster than just using depth-first incoherent ray tracing. Timo Aila et al. showed that on Kepler (gtx680) they can trace about 200 Mrays/s for incoherent rays for simple scenes. They see roughly double those numbers for coherent rays. If you've got only primary rays, simple-ish shadow rays, and delta reflections/refractions, you should see somewhere between those values for a simple scene.

I would suggest reading Timo's paper to see the best ways to trace rays for NVIDIA hardware with CUDA.

hobold
Posts: 56
Joined: Wed Dec 21, 2011 6:08 pm

Re: Discussion on GPU ray tracing

Post by hobold » Thu Jul 19, 2012 8:18 pm

The fundamental reason why GPUs require some sort of coherence to achieve their best performance lies in their basic architecture. All current GPU hardware is based on the SIMD principle ("Single Instruction Multiple Data"). This means that, regardless of all the syntactical sugar provided by CUDA or OpenCL, a fairly large chunk of data (a "warp", "wavefront", or "vector") must move through the machine in lockstep (at least conceptually ... that lockstep is not explicitly enforced, but a direct consequence of a SIMD instruction set architecture).

All data items in such a chunk have to follow the same control flow through the program. Any switch() or if() that semantically represents two or more alternative program flows can (and usually will) cause the whole chunk of data go through several alternative paths of processing in turn. At least half of the temporary results will be discarded (on average); only the values of the correct program path will be retained.

Very recently, Nvidia and others have begun to extend the SIMD principle to alleviate some of those drawbacks. But by and large, they still apply. A careful programmer may be able to better tailor algorithms to the strengths and weaknesses of the hardware, but information theory suggests that the smarter (i.e. the more efficient) an algorithm is, the more data it will re-use to save computation. That re-use in turn causes data dependencies, and those dependencies limit parallelism.

From my own experience, brute force methods tend to be more amenable to parallelization. In some cases, a dumber algorithm might indeed run better on massively parallel hardware than a smarter algorithm. Unfortunately, it is very hard to predict which will win - and a lot of work to implement both with enough fine tuning for a fair comparison.

spectral
Posts: 382
Joined: Wed Nov 30, 2011 2:27 pm
Contact:

Re: Discussion on GPU ray tracing

Post by spectral » Fri Jul 20, 2012 11:08 am

My experience with sorting the ray is not great to be honnest... often it is slower than processing them another way.
The main advantage of sorting techniques is that you can sort your rays and your shaders too with the same sorting process.

If you use CUDA it is easy to do because you have some very good libraries for that and they are well integrated. In OpenCL you can use CLPP by example.

The problem I have got with sorting is that (in OpenCL), to sort, you have to exit your kernel, then start the "sorting" kernel and later continue with another kernel. So, you divide your process in several kernels. It is not good because it force you to store some datas in the global memory (kernel 1), and load theses datas in the last kernel. As you know global memory is slow. Maybe there are some way to sort everything without exiting the kernel, I haven't tested !

What I can propose you is to use path regeneration :
http://www.vis.uni-stuttgart.de/~novakj ... 010_pt.pdf
http://www.gpucomputing.net/?q=node/1308
http://graphics.tudelft.nl/~dietger/HPG ... tation.pdf

You can also read the following one : http://www.sci.utah.edu/~wald/Publicati ... ompact.pdf
Spectral
OMPF 2 global moderator

Post Reply