## Intel Xeon Phi Preferred Pricing - Only $400 per Card ? Show-off, reference material & tools. toxie Posts: 118 Joined: Mon Nov 28, 2011 12:30 pm Location: germany Contact: ### Re: Intel Xeon Phi Preferred Pricing - Only$400 per Card ?

of course, that's why i wondered what intel suggests nowadays as the "convenient" way to drive these new boards, especially as there were always claims (see latest embree presentation for example) where it was claimed that the compiler does most of the nasty stuff automatically nowadays..
Better you leave here with your head still full of kitty cats and puppy dogs.

Posts: 206
Joined: Fri Dec 02, 2011 8:00 am

### Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ? OpenCL is going to be supported: http://software.intel.com/en-us/vcsourc ... ncl-sdk-xe BTW, I have recently tested Intel OpenCL CPU device included in their latest beta and it is now a 50% faster than AMD OpenCL CPU device. It was slightly slower than AMD device before, so I assume they are spending some resource on improving their OpenCL support. I guess it is going one of the preferred method to use Xeon Phi. hobold Posts: 56 Joined: Wed Dec 21, 2011 6:08 pm ### Re: Intel Xeon Phi Preferred Pricing - Only$400 per Card ?

Xeon Phi is no magic bullet. Massively parallel hardware cannot and will never "automagically" accelerate serially dependent algorithms. In that regard, it is no improvement over GPUs.

(There is also a fundamental line of reasoning that the most efficient algorithms keep re-using each bit of computed information as much as possible, and thus necessarily have some strong data dependencies. This information centric view also applies in other places, such as branch prediction, which gets the more effective the less actual information a given conditional branch is producing when it makes its decision.)

HOWEVER, and that's a big however, there are many practical applications which are slow not because of the amount of processing required per single data item, but because of the huge number of data items overall. Massively parallel hardware exists, because it is the best hardware we can build for tackling these kinds of problems.

Now, thinking in parallel is unusual for us. We don't have much practice, not much experience with it. It is simply hard, even beyond the fundamental difficulty inherent in parallelism. So the best tool for us are those that enable us to learn most quickly. Static analysis of our codes, runtime profiles, insight into dynamic behaviour of the hardware when running our code.

With regards to that kind of introspection, Phi has the advantage that it looks a lot like the supercomputers we already have. A programmer who has spent years grokking those, will find Phi rather familiar.

For the regulars here at ompf, the situation could well be turned on its head, because the prior experience is with GPUs, not with super clusters.

graphicsMan
Posts: 160
Joined: Mon Nov 28, 2011 7:28 pm

### Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ? My take is that it will be easy to get core-wise parallelism out of the chip. If your code scales well on CPU cores already, it will scale to the cores on Phi. However, without making good use of the SIMD lanes, I would guess your performance will not be any better than an 8-core Xeon chip. This requires using intrinsics (ugh), or a special language like opencl or one of the Intel special ones to make good use of the SIMD parts of the chip. I doubt it will be any easier to use than a GPU to get anywhere close to good chip utilization. mpeterson Posts: 56 Joined: Fri Jan 06, 2012 3:09 pm ### Re: Intel Xeon Phi Preferred Pricing - Only$400 per Card ?

graphicsMan wrote:My take is that it will be easy to get core-wise parallelism out of the chip. If your code scales well on CPU cores already, it will scale to the cores on Phi. However, without making good use of the SIMD lanes, I would guess your performance will not be any better than an 8-core Xeon chip. This requires using intrinsics (ugh), or a special language like opencl or one of the Intel special ones to make good use of the SIMD parts of the chip. I doubt it will be any easier to use than a GPU to get anywhere close to good chip utilization.
assume that it will run at 1ghz with ~ 50 inorder cores (50ghz). multi-threaded codes (non-vectorized) on current hw (dual sb) at 3.x ghz already have (16x3.x ghz >= 48ghz) on out-of-order
cores. when we also take into account that we will have a much higher sync. overhead + higher pressure on the system bus (io/mem ops) it cannot do better for any algos as on todays
up2date xeon servers. in practice we get maybe something on the level of a single cpu => vectorization is a must to see some progress. dont expect sse and avx codes to run out of the box...
and compilers can only do trivial stuff in terms of vectorization.

mp

dr_eck
Posts: 46
Joined: Mon Dec 05, 2011 7:35 pm

### Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ? mpeterson wrote: assume that it will run at 1ghz with ~ 50 inorder cores (50ghz). multi-threaded codes (non-vectorized) on current hw (dual sb) at 3.x ghz already have (16x3.x ghz >= 48ghz) on out-of-order cores. mp I just read that the Phi has 4 threads per core, or 200 threads, and runs at 1.05 GHz. Given your formula, I get (200 threads * 1.x GHz) / (16 threads * 3.x GHz) = 4X improvement. The rated 1.x TFLOPS is also very comparable to the K20, so I would expect the Phi to be competitive with it. When the second version comes out next summer with 60 cores for <US$2k, I may give it a try.

graphicsMan
Posts: 160
Joined: Mon Nov 28, 2011 7:28 pm

### Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ? Only time will tell of course, but I suspect that the 4 threads per core will only serve to partially fix the fact that the cores are in-order. A 4x improvement factor is definitely way too optimistic mpeterson Posts: 56 Joined: Fri Jan 06, 2012 3:09 pm ### Re: Intel Xeon Phi Preferred Pricing - Only$400 per Card ?

graphicsMan wrote:Only time will tell of course, but I suspect that the 4 threads per core will only serve to partially fix the fact that the cores are in-order. A 4x improvement factor is definitely way too optimistic

right, each core is 4x ht (to compensate for non-vectorized codes). when running with all threads together the scheduler and
the synchronization (atomics) becomes a bottleneck.

spectral
Posts: 382
Joined: Wed Nov 30, 2011 2:27 pm
Contact:

### Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ? And what about the memory accesses ? It seems to me that more we increase the number of cores... more we have "the same problems" than with GPUs ! Spectral OMPF 2 global moderator graphicsMan Posts: 160 Joined: Mon Nov 28, 2011 7:28 pm ### Re: Intel Xeon Phi Preferred Pricing - Only$400 per Card ?

That's one of the reasons for having 4 threads per core... hide memory access latency. However, 4 threads doesn't seem like nearly enough to accomplish that. GPUs have many warps to hide these costs.