giga rays on intel phi

A picture is worth a thousand words.
mpeterson
Posts: 51
Joined: Fri Jan 06, 2012 3:09 pm

Re: giga rays on intel phi

Postby mpeterson » Tue Sep 10, 2013 10:52 am

yes, we plan to have 5-6grays. if needed we will use 4 or more mics/gpus per system but just for comparision.
the final architecture will be based on tensilica procs (why ever, i dont judge this). to be competitive i have to see
what current mass-market components can do and extrapolate this into the future. money doesnt matter so far.

the car scene is commercial i got told => not allowed to distribute any geometry/texture data. so i tried good old sponza:

for simple pt (10 bounces) i get around 162mrays/s. that is nearly a 10x degradation considering that i can raycast sponza
with primary rays < 1ms at 1024x1024 ! open sky scenes are much better (clear). car scene on a plane + sky hdr at around
300-500 mryas/s. so i think i need 10-12 accelerators.

titan just arrived. lets see what is possible here. we use Timo Aila et al. traversal/intersection without the sorting crap to
have a fair comparision.

first results:

primary/coherent rays: titan is about 2x behind.
pt on cornell and sponza (max 10 diff. bounces): both are more or less on a par.

the titan is much faster than the gtx480 and cheaper than the phi. on the other side the phi has 16gb and i think
that makes the price difference. in terms of io f.e. rdma the phi is far ahead allowing to build up clouds/clusters
that can mount filesystems etc. and can work close together on distributed pci-e networks. i think this is what we
need to reach the 5-6grays. except some "crapy" mpi there is nothing like that possible on gpus yet. and if we can trust intel
we see the phi as a standard cpu in 1-2 years. avoiding accelerators at all is always a good thing.

mp

beason
Posts: 47
Joined: Sat Dec 10, 2011 1:58 am
Location: Los Angeles, CA

Re: giga rays on intel phi

Postby beason » Tue Sep 10, 2013 10:04 pm

and if we can trust intel we see the phi as a standard cpu in 1-2 years


What does this mean? Will the Phi be a drop-in replacement for regular CPUs or something else? (Do you have a link?)

mpeterson
Posts: 51
Joined: Fri Jan 06, 2012 3:09 pm

Re: giga rays on intel phi

Postby mpeterson » Wed Sep 11, 2013 2:35 pm

beason wrote:What does this mean? Will the Phi be a drop-in replacement for regular CPUs or something else? (Do you have a link?)


regular cpu replacement (look for kinghts landing/14nm broadwell roadmap, end 2014,1q 2015).
i know two upcoming top10 supercomputers that will use these cpus without pci-e accl.

Dade
Posts: 206
Joined: Fri Dec 02, 2011 8:00 am

Re: giga rays on intel phi

Postby Dade » Thu Sep 12, 2013 5:09 am

mpeterson wrote:regular cpu replacement (look for kinghts landing/14nm broadwell roadmap, end 2014,1q 2015).
i know two upcoming top10 supercomputers that will use these cpus without pci-e accl.


Interesting, does anyone know if Xeon Phi has some kind memory/cache coherency support in order to share a single pool of memory across multiple Xeon Phi ?

mpeterson
Posts: 51
Joined: Fri Jan 06, 2012 3:09 pm

Re: giga rays on intel phi

Postby mpeterson » Fri Sep 13, 2013 4:59 pm

ray distribution tests in outside environments:

AO rays with around 720mrays/s (tracing 16rays per octant together)

Image

Diff. bounces (depth 8) with around 490mrays/s

Image

for scenes like that ao even looks better to me, mp.

graphicsMan
Posts: 156
Joined: Mon Nov 28, 2011 7:28 pm

Re: giga rays on intel phi

Postby graphicsMan » Fri Sep 13, 2013 5:16 pm

Just curious... if you run 8 bounces in Sibenik Cathedral or something with even more occlusion/visibility complexity, what kind of ray throughput do you see? Any luck getting numbers for Titan yet?

mpeterson
Posts: 51
Joined: Fri Jan 06, 2012 3:09 pm

Re: giga rays on intel phi

Postby mpeterson » Fri Sep 13, 2013 5:51 pm

graphicsMan wrote:Just curious... if you run 8 bounces in Sibenik Cathedral or something with even more occlusion/visibility complexity, what kind of ray throughput do you see? Any luck getting numbers for Titan yet?



yes titan (serveral) are in place. as i said above, both are more or less equal in performing diffuse bounces (around 165mray/s).
for primary rays mic is 2x ahead. this is because i have optimized kernels for that. on titan i use the optimized bvh2 kernels
for kepler + woop triangle test for any typ of ray. when it comes to opengl/frame-display/post-processing it is much better with the gpu only
solution on a workstation. still open what i will do...

Dade
Posts: 206
Joined: Fri Dec 02, 2011 8:00 am

Re: giga rays on intel phi

Postby Dade » Sat Sep 14, 2013 2:19 pm

mpeterson wrote:yes titan (serveral) are in place. as i said above, both are more or less equal in performing diffuse bounces (around 165mray/s).
for primary rays mic is 2x ahead.


Strange, I would have expected exactly the opposite result :?: I mean, MIC should be less sensible to thread divergence. May be cache, play an important role here :?:

hobold
Posts: 56
Joined: Wed Dec 21, 2011 6:08 pm

Re: giga rays on intel phi

Postby hobold » Sat Sep 14, 2013 3:27 pm

Dade wrote:Strange, I would have expected exactly the opposite result :?: I mean, MIC should be less sensible to thread divergence. May be cache, play an important role here :?:
Or maybe raw memory bandwidth, and particularly scatter/gather performance, could be the relevant bottleneck here. As far as I know, Titan has much more peak RAM bandwidth, and is significantly more aggressive in parallelizing divergent memory accesses than Phi.

graphicsMan
Posts: 156
Joined: Mon Nov 28, 2011 7:28 pm

Re: giga rays on intel phi

Postby graphicsMan » Sat Sep 14, 2013 3:53 pm

Dade wrote:
mpeterson wrote:yes titan (serveral) are in place. as i said above, both are more or less equal in performing diffuse bounces (around 165mray/s).
for primary rays mic is 2x ahead.


Strange, I would have expected exactly the opposite result :?: I mean, MIC should be less sensible to thread divergence. May be cache, play an important role here :?:


I think it depends on what you mean by "thread". If you are talking SIMD lanes, then I'd have to say it's probably worse for thread divergence than a GPU. GPUs are built for SIMT with the expectation that you'll have divergence. MIC is really traditional SIMD, but with added scatter/gather functions. There are also fewer *actual* threads to hide divergent load latencies.


Return to “Visuals”

Who is online

Users browsing this forum: No registered users and 1 guest