AVX512 MBVH4 Traversal

Practical and theoretical implementation discussion.
mpeterson
Posts: 52
Joined: Fri Jan 06, 2012 3:09 pm

AVX512 MBVH4 Traversal

Postby mpeterson » Fri Sep 23, 2016 1:05 pm

with intels knl being now available to everyone, people start
asking for a native avx512 port of clpt (http://ompf2.com/viewtopic.php?f=3&t=2075).
knl seems to be the first accelerator from intel with some kind of power under the hood
(knf and knc have been simple nonstarters). so i did an optimized implementation
of clpt for avx512 and was surprised about the outcome.
clpt is by far the fastest rt-kernel for cpus today but was never compared to gpus.
so i was looking around for some numbers. not much to find ! so i used the
medium numbers (viewpoint 2) from amd firerays 2 on firepro w9100 and measured the test-scenes
on cuda by using the implementation from nvidia (http://www.nvidia.com/object/nvidia_research_pub_011.html)
optimized for nv titan. to make it short: using coherent ray traversal, knl can render most of the scenes
i have around stable below 1 ms into a 1024x1024 frambuffer.

rem: cuda and knl numbers are avg. values calculated out of a sequence of several thousand frames (scene fly-thru).
amd firerays is single shot.

Image

rtpt
Posts: 6
Joined: Wed May 20, 2015 1:17 pm

Re: AVX512 MBVH4 Traversal

Postby rtpt » Mon Sep 26, 2016 8:29 am

A Nvidia 1080gtx should be twice as fast as the
titan. Can you please run your tests on current
hardware ?

jbikker
Posts: 175
Joined: Mon Nov 28, 2011 8:18 am
Contact:

Re: AVX512 MBVH4 Traversal

Postby jbikker » Tue Sep 27, 2016 10:52 am

Could you also test divergent rays? Architectures that rely on caches (i.e. CPUs) seem to suffer greatly from divergent mem access, while architectures that hide latencies using many threads typically fare much better. I would be suprised to see that the latest CPU-like device outperforms the latest GPU device in that setting (in fact, I don't expect it to come even close).

atlas
Posts: 26
Joined: Thu Apr 16, 2015 12:01 am

Re: AVX512 MBVH4 Traversal

Postby atlas » Wed Sep 28, 2016 1:10 am

Price point between the devices is also a consideration, I'm not sure this is an apples-to-apples comparison. Power envelopes aside, how many GPUs can you buy for the price of a Knight's Landing?

Getting over 2.5B rays/s on a CPU is exciting though, but I agree we have to see the incoherent numbers.

MohamedSakr
Posts: 83
Joined: Thu Apr 24, 2014 2:27 am

Re: AVX512 MBVH4 Traversal

Postby MohamedSakr » Wed Sep 28, 2016 2:07 pm

great results, but as others said, in divergence case CPU will crawl (cache misses, waiting memory...).

mpeterson
Posts: 52
Joined: Fri Jan 06, 2012 3:09 pm

Re: AVX512 MBVH4 Traversal

Postby mpeterson » Fri Sep 30, 2016 11:53 am

yes, i would like to run the bench on latest gpu gen. but titan is all i have around.concerning the incoherent transport: yes it will be a different story for shure. first of all, the implementation is not straight forward on avx512 (avx512 is pretty inflexible when it comes to random access streaming/computation -> there is no fast way to shuffle single elements around, limited integer/int16 support etc.). so implementation time is pretty high (a clear disadvantage here). on the other side: running our full blown pt with avx2 backend on knl the performance is great. on average more than 3x compared to octane renderer on the titan (except simple scenes).

MohamedSakr
Posts: 83
Joined: Thu Apr 24, 2014 2:27 am

Re: AVX512 MBVH4 Traversal

Postby MohamedSakr » Fri Sep 30, 2016 12:00 pm

does this test include texture access? like a standard interior scene full of textures. (as the bottleneck is always memory).

mpeterson
Posts: 52
Joined: Fri Jan 06, 2012 3:09 pm

Re: AVX512 MBVH4 Traversal

Postby mpeterson » Tue Oct 04, 2016 10:25 am

yes (nn sampling and bi-linear sampling). keep in mind that knl has 90gb/s on pretty large mem and extra 400gb/s on 16gb.
atm we are playing around with all the diff. mem. options. beside this, we try to run the pt as a special kind of "stand-alone-app"
on knl without os noise. a lot o new stuff here to explore...

MohamedSakr
Posts: 83
Joined: Thu Apr 24, 2014 2:27 am

Re: AVX512 MBVH4 Traversal

Postby MohamedSakr » Wed Oct 05, 2016 10:12 am

mpeterson wrote:yes (nn sampling and bi-linear sampling). keep in mind that knl has 90gb/s on pretty large mem and extra 400gb/s on 16gb.
atm we are playing around with all the diff. mem. options. beside this, we try to run the pt as a special kind of "stand-alone-app"
on knl without os noise. a lot o new stuff here to explore...

it would be interesting if you test it on a production ready renderer (like Cycles). , as it is well known for its bruteforce PT nature, and it uses embree. (got CPU/OpenCL/CUDA).


Return to “General Development”

Who is online

Users browsing this forum: Google [Bot] and 1 guest