Path tracing benchmark

Practical and theoretical implementation discussion.
Post Reply
Posts: 59
Joined: Fri Jan 06, 2012 3:09 pm

Path tracing benchmark

Post by mpeterson » Wed Jan 25, 2017 1:16 pm

It took some time after we build up our first coherent traversal kernel for avx512
to get a competitive incoherent kernel ready for prime time. Here it is ! For the benchmark we used a full pipeline available easily on any

1) camera ray generation, traversal, intersection, shading
2) for any hit: 64 secondary diffuse rays, traversal, intersection and shading


1) Nvidia 1080GTX kernels are based on: "Understanding the Efficiency of Ray Traversal on GPUs"
2) Embree 2.13 avx512 kernels on intel phi 7250 1.4ghz, 68 cores:

- avx512knl::BVH8Triangle4Intersector16HybridMoellerNoFilter for camera rays and
- avx512knl::BVH8Triangle4Intersector1Moeller for secondary diffuse rays

Bvh-Compiler Settings:
- avx512knl::BVH8BuilderFastSpatialSAH

3) Our new avx512 kernels for coherent and incoherent ray transport on intel phi 7250 1.4ghz, 68 cores


Our new kernels clearly outperform any other implementation on all important platforms currently used for path tracing.
Looking at the knl cpu there is another advantage: We can directly connect to high performance networks like infiniband
to scale extremely good compared to gpus in a cluster-like configuration.

At the end we also implemented/migrated our fast bvh compilers to avx512.
The compiler timings for all scenes above are:

Fairy: 13.6ms
Runghold: 105.1ms
San Miguel: 359.3ms
Sponza: 7.1ms

The embree compilers in any configuration are far behind these timings on our knl test-system. Therefore we decided not to
publish any numbers here. The sbvh compiler used in "Understanding the Efficiency of Ray Traversal on GPUs" is far away from being
optimized at all.


Posts: 1
Joined: Wed Jan 25, 2017 2:10 pm

Re: Path tracing benchmark

Post by manysmallcores » Thu Jan 26, 2017 9:09 am

Cool stuff!

Couple of remarks/questions:
- What about open-sourcing the benchmark? Otherwise nobody else can reproduce these numbers.
- Why not using Optix Prime as it should provide additional optimizations, right?
- Shouldn't a Titan X be quite a bit faster than a 1080GTX?
- Why are you switching from packets to single rays in the Embree case? I guess your benchmark uses large streams of rays so why don't you stick to the hybrid interface or even use Embree's ray stream interface? The current kernel selection seems strange.
- How many rays per stream (per HW thread) do you use? Are you using any kind of stream compaction?
- Can you run the benchmark with more complex scenes (fairy, sponza are just toy models). Maybe something with 30-100M primitives should be more representative.
- From my experience the build times of a spatial split builder largely depends on how often you do spatial splits (vs. object splits), in particular deep deep down in the tree. Do you use similar heuristics than the other implementations? Can you compare the quality of the BVHs (both for NV and Embree), otherwise build times are very hard to compare?
- I actually have access to a 7250 phi machine and I've quickly tested Embree's bvh build performance (buildbench) for SanMiguel. Without spatial splits I get half of your measured time, and with spatial splits a bit less than 2x. Does not seem way of...
- What about the build performance for SanMiguel with Optix Prime?
- What is the warp utilization for the 1080GTX during the benchmark?


Posts: 6
Joined: Wed May 20, 2015 1:17 pm

Re: Path tracing benchmark

Post by rtpt » Thu Jan 26, 2017 12:25 pm

Hi, really great numbers. I am mostly interested in bvh compilers. I also did some testing on knl against
embree and it seems that your implementation is more than 10X better. Is this project going to be
open sourced ?

Posts: 59
Joined: Fri Jan 06, 2012 3:09 pm

Re: Path tracing benchmark

Post by mpeterson » Wed Nov 29, 2017 9:36 am

just a quick update: we did some tests on intels knights mill. to make it short: the machine is boring.

no perf. progress at all. all the vector ext. are simple crap. for graphics intel seems to be a dead end. gpus will dominate the next years.

so all the cpu stuff was wasted time.


Post Reply