### Path tracing benchmark

Posted:

**Wed Jan 25, 2017 1:16 pm**It took some time after we build up our first coherent traversal kernel for avx512 http://ompf2.com/viewtopic.php?f=3&t=2103

to get a competitive incoherent kernel ready for prime time. Here it is ! For the benchmark we used a full pipeline available easily on any

architecture:

1) camera ray generation, traversal, intersection, shading

2) for any hit: 64 secondary diffuse rays, traversal, intersection and shading

Architectures:

1) Nvidia 1080GTX kernels are based on: "Understanding the Efficiency of Ray Traversal on GPUs"

2) Embree 2.13 avx512 kernels on intel phi 7250 1.4ghz, 68 cores:

Kernels:

- avx512knl::BVH8Triangle4Intersector16HybridMoellerNoFilter for camera rays and

- avx512knl::BVH8Triangle4Intersector1Moeller for secondary diffuse rays

Bvh-Compiler Settings:

- avx512knl::BVH8BuilderFastSpatialSAH

3) Our new avx512 kernels for coherent and incoherent ray transport on intel phi 7250 1.4ghz, 68 cores

Our new kernels clearly outperform any other implementation on all important platforms currently used for path tracing.

Looking at the knl cpu there is another advantage: We can directly connect to high performance networks like infiniband

to scale extremely good compared to gpus in a cluster-like configuration.

At the end we also implemented/migrated our fast bvh compilers http://rapt.technology/data/pssbvh.pdf to avx512.

The compiler timings for all scenes above are:

Fairy: 13.6ms

Runghold: 105.1ms

San Miguel: 359.3ms

Sponza: 7.1ms

The embree compilers in any configuration are far behind these timings on our knl test-system. Therefore we decided not to

publish any numbers here. The sbvh compiler used in "Understanding the Efficiency of Ray Traversal on GPUs" is far away from being

optimized at all.

mp

to get a competitive incoherent kernel ready for prime time. Here it is ! For the benchmark we used a full pipeline available easily on any

architecture:

1) camera ray generation, traversal, intersection, shading

2) for any hit: 64 secondary diffuse rays, traversal, intersection and shading

Architectures:

1) Nvidia 1080GTX kernels are based on: "Understanding the Efficiency of Ray Traversal on GPUs"

2) Embree 2.13 avx512 kernels on intel phi 7250 1.4ghz, 68 cores:

Kernels:

- avx512knl::BVH8Triangle4Intersector16HybridMoellerNoFilter for camera rays and

- avx512knl::BVH8Triangle4Intersector1Moeller for secondary diffuse rays

Bvh-Compiler Settings:

- avx512knl::BVH8BuilderFastSpatialSAH

3) Our new avx512 kernels for coherent and incoherent ray transport on intel phi 7250 1.4ghz, 68 cores

Our new kernels clearly outperform any other implementation on all important platforms currently used for path tracing.

Looking at the knl cpu there is another advantage: We can directly connect to high performance networks like infiniband

to scale extremely good compared to gpus in a cluster-like configuration.

At the end we also implemented/migrated our fast bvh compilers http://rapt.technology/data/pssbvh.pdf to avx512.

The compiler timings for all scenes above are:

Fairy: 13.6ms

Runghold: 105.1ms

San Miguel: 359.3ms

Sponza: 7.1ms

The embree compilers in any configuration are far behind these timings on our knl test-system. Therefore we decided not to

publish any numbers here. The sbvh compiler used in "Understanding the Efficiency of Ray Traversal on GPUs" is far away from being

optimized at all.

mp