to get a competitive incoherent kernel ready for prime time. Here it is ! For the benchmark we used a full pipeline available easily on any
architecture:
1) camera ray generation, traversal, intersection, shading
2) for any hit: 64 secondary diffuse rays, traversal, intersection and shading
Architectures:
1) Nvidia 1080GTX kernels are based on: "Understanding the Efficiency of Ray Traversal on GPUs"
2) Embree 2.13 avx512 kernels on intel phi 7250 1.4ghz, 68 cores:
Kernels:
- avx512knl::BVH8Triangle4Intersector16HybridMoellerNoFilter for camera rays and
- avx512knl::BVH8Triangle4Intersector1Moeller for secondary diffuse rays
Bvh-Compiler Settings:
- avx512knl::BVH8BuilderFastSpatialSAH
3) Our new avx512 kernels for coherent and incoherent ray transport on intel phi 7250 1.4ghz, 68 cores

Our new kernels clearly outperform any other implementation on all important platforms currently used for path tracing.
Looking at the knl cpu there is another advantage: We can directly connect to high performance networks like infiniband
to scale extremely good compared to gpus in a cluster-like configuration.
At the end we also implemented/migrated our fast bvh compilers http://rapt.technology/data/pssbvh.pdf to avx512.
The compiler timings for all scenes above are:
Fairy: 13.6ms
Runghold: 105.1ms
San Miguel: 359.3ms
Sponza: 7.1ms
The embree compilers in any configuration are far behind these timings on our knl test-system. Therefore we decided not to
publish any numbers here. The sbvh compiler used in "Understanding the Efficiency of Ray Traversal on GPUs" is far away from being
optimized at all.
mp