giga rays on intel phi
Re: giga rays on intel phi
yes, we plan to have 5-6grays. if needed we will use 4 or more mics/gpus per system but just for comparision.
the final architecture will be based on tensilica procs (why ever, i dont judge this). to be competitive i have to see
what current mass-market components can do and extrapolate this into the future. money doesnt matter so far.
the car scene is commercial i got told => not allowed to distribute any geometry/texture data. so i tried good old sponza:
for simple pt (10 bounces) i get around 162mrays/s. that is nearly a 10x degradation considering that i can raycast sponza
with primary rays < 1ms at 1024x1024 ! open sky scenes are much better (clear). car scene on a plane + sky hdr at around
300-500 mryas/s. so i think i need 10-12 accelerators.
titan just arrived. lets see what is possible here. we use Timo Aila et al. traversal/intersection without the sorting crap to
have a fair comparision.
first results:
primary/coherent rays: titan is about 2x behind.
pt on cornell and sponza (max 10 diff. bounces): both are more or less on a par.
the titan is much faster than the gtx480 and cheaper than the phi. on the other side the phi has 16gb and i think
that makes the price difference. in terms of io f.e. rdma the phi is far ahead allowing to build up clouds/clusters
that can mount filesystems etc. and can work close together on distributed pci-e networks. i think this is what we
need to reach the 5-6grays. except some "crapy" mpi there is nothing like that possible on gpus yet. and if we can trust intel
we see the phi as a standard cpu in 1-2 years. avoiding accelerators at all is always a good thing.
mp
the final architecture will be based on tensilica procs (why ever, i dont judge this). to be competitive i have to see
what current mass-market components can do and extrapolate this into the future. money doesnt matter so far.
the car scene is commercial i got told => not allowed to distribute any geometry/texture data. so i tried good old sponza:
for simple pt (10 bounces) i get around 162mrays/s. that is nearly a 10x degradation considering that i can raycast sponza
with primary rays < 1ms at 1024x1024 ! open sky scenes are much better (clear). car scene on a plane + sky hdr at around
300-500 mryas/s. so i think i need 10-12 accelerators.
titan just arrived. lets see what is possible here. we use Timo Aila et al. traversal/intersection without the sorting crap to
have a fair comparision.
first results:
primary/coherent rays: titan is about 2x behind.
pt on cornell and sponza (max 10 diff. bounces): both are more or less on a par.
the titan is much faster than the gtx480 and cheaper than the phi. on the other side the phi has 16gb and i think
that makes the price difference. in terms of io f.e. rdma the phi is far ahead allowing to build up clouds/clusters
that can mount filesystems etc. and can work close together on distributed pci-e networks. i think this is what we
need to reach the 5-6grays. except some "crapy" mpi there is nothing like that possible on gpus yet. and if we can trust intel
we see the phi as a standard cpu in 1-2 years. avoiding accelerators at all is always a good thing.
mp
Re: giga rays on intel phi
What does this mean? Will the Phi be a drop-in replacement for regular CPUs or something else? (Do you have a link?)and if we can trust intel we see the phi as a standard cpu in 1-2 years
Re: giga rays on intel phi
regular cpu replacement (look for kinghts landing/14nm broadwell roadmap, end 2014,1q 2015).beason wrote:What does this mean? Will the Phi be a drop-in replacement for regular CPUs or something else? (Do you have a link?)
i know two upcoming top10 supercomputers that will use these cpus without pci-e accl.
Re: giga rays on intel phi
Interesting, does anyone know if Xeon Phi has some kind memory/cache coherency support in order to share a single pool of memory across multiple Xeon Phi ?mpeterson wrote: regular cpu replacement (look for kinghts landing/14nm broadwell roadmap, end 2014,1q 2015).
i know two upcoming top10 supercomputers that will use these cpus without pci-e accl.
-
- Posts: 167
- Joined: Mon Nov 28, 2011 7:28 pm
Re: giga rays on intel phi
Just curious... if you run 8 bounces in Sibenik Cathedral or something with even more occlusion/visibility complexity, what kind of ray throughput do you see? Any luck getting numbers for Titan yet?
Re: giga rays on intel phi
graphicsMan wrote:Just curious... if you run 8 bounces in Sibenik Cathedral or something with even more occlusion/visibility complexity, what kind of ray throughput do you see? Any luck getting numbers for Titan yet?
yes titan (serveral) are in place. as i said above, both are more or less equal in performing diffuse bounces (around 165mray/s).
for primary rays mic is 2x ahead. this is because i have optimized kernels for that. on titan i use the optimized bvh2 kernels
for kepler + woop triangle test for any typ of ray. when it comes to opengl/frame-display/post-processing it is much better with the gpu only
solution on a workstation. still open what i will do...
Re: giga rays on intel phi
Strange, I would have expected exactly the opposite resultmpeterson wrote: yes titan (serveral) are in place. as i said above, both are more or less equal in performing diffuse bounces (around 165mray/s).
for primary rays mic is 2x ahead.


Re: giga rays on intel phi
Or maybe raw memory bandwidth, and particularly scatter/gather performance, could be the relevant bottleneck here. As far as I know, Titan has much more peak RAM bandwidth, and is significantly more aggressive in parallelizing divergent memory accesses than Phi.Dade wrote: Strange, I would have expected exactly the opposite resultI mean, MIC should be less sensible to thread divergence. May be cache, play an important role here
-
- Posts: 167
- Joined: Mon Nov 28, 2011 7:28 pm
Re: giga rays on intel phi
I think it depends on what you mean by "thread". If you are talking SIMD lanes, then I'd have to say it's probably worse for thread divergence than a GPU. GPUs are built for SIMT with the expectation that you'll have divergence. MIC is really traditional SIMD, but with added scatter/gather functions. There are also fewer *actual* threads to hide divergent load latencies.Dade wrote:Strange, I would have expected exactly the opposite resultmpeterson wrote: yes titan (serveral) are in place. as i said above, both are more or less equal in performing diffuse bounces (around 165mray/s).
for primary rays mic is 2x ahead.I mean, MIC should be less sensible to thread divergence. May be cache, play an important role here