Intel Xeon Phi Preferred Pricing - Only $400 per Card ?
Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ?
of course, that's why i wondered what intel suggests nowadays as the "convenient" way to drive these new boards, especially as there were always claims (see latest embree presentation for example) where it was claimed that the compiler does most of the nasty stuff automatically nowadays..
Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ?
OpenCL is going to be supported: http://software.intel.com/en-us/vcsourc ... ncl-sdk-xe
BTW, I have recently tested Intel OpenCL CPU device included in their latest beta and it is now a 50% faster than AMD OpenCL CPU device. It was slightly slower than AMD device before, so I assume they are spending some resource on improving their OpenCL support. I guess it is going one of the preferred method to use Xeon Phi.
BTW, I have recently tested Intel OpenCL CPU device included in their latest beta and it is now a 50% faster than AMD OpenCL CPU device. It was slightly slower than AMD device before, so I assume they are spending some resource on improving their OpenCL support. I guess it is going one of the preferred method to use Xeon Phi.
Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ?
Xeon Phi is no magic bullet. Massively parallel hardware cannot and will never "automagically" accelerate serially dependent algorithms. In that regard, it is no improvement over GPUs.
(There is also a fundamental line of reasoning that the most efficient algorithms keep re-using each bit of computed information as much as possible, and thus necessarily have some strong data dependencies. This information centric view also applies in other places, such as branch prediction, which gets the more effective the less actual information a given conditional branch is producing when it makes its decision.)
HOWEVER, and that's a big however, there are many practical applications which are slow not because of the amount of processing required per single data item, but because of the huge number of data items overall. Massively parallel hardware exists, because it is the best hardware we can build for tackling these kinds of problems.
Now, thinking in parallel is unusual for us. We don't have much practice, not much experience with it. It is simply hard, even beyond the fundamental difficulty inherent in parallelism. So the best tool for us are those that enable us to learn most quickly. Static analysis of our codes, runtime profiles, insight into dynamic behaviour of the hardware when running our code.
With regards to that kind of introspection, Phi has the advantage that it looks a lot like the supercomputers we already have. A programmer who has spent years grokking those, will find Phi rather familiar.
For the regulars here at ompf, the situation could well be turned on its head, because the prior experience is with GPUs, not with super clusters.
(There is also a fundamental line of reasoning that the most efficient algorithms keep re-using each bit of computed information as much as possible, and thus necessarily have some strong data dependencies. This information centric view also applies in other places, such as branch prediction, which gets the more effective the less actual information a given conditional branch is producing when it makes its decision.)
HOWEVER, and that's a big however, there are many practical applications which are slow not because of the amount of processing required per single data item, but because of the huge number of data items overall. Massively parallel hardware exists, because it is the best hardware we can build for tackling these kinds of problems.
Now, thinking in parallel is unusual for us. We don't have much practice, not much experience with it. It is simply hard, even beyond the fundamental difficulty inherent in parallelism. So the best tool for us are those that enable us to learn most quickly. Static analysis of our codes, runtime profiles, insight into dynamic behaviour of the hardware when running our code.
With regards to that kind of introspection, Phi has the advantage that it looks a lot like the supercomputers we already have. A programmer who has spent years grokking those, will find Phi rather familiar.
For the regulars here at ompf, the situation could well be turned on its head, because the prior experience is with GPUs, not with super clusters.
-
- Posts: 167
- Joined: Mon Nov 28, 2011 7:28 pm
Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ?
My take is that it will be easy to get core-wise parallelism out of the chip. If your code scales well on CPU cores already, it will scale to the cores on Phi. However, without making good use of the SIMD lanes, I would guess your performance will not be any better than an 8-core Xeon chip. This requires using intrinsics (ugh), or a special language like opencl or one of the Intel special ones to make good use of the SIMD parts of the chip. I doubt it will be any easier to use than a GPU to get anywhere close to good chip utilization.
Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ?
assume that it will run at 1ghz with ~ 50 inorder cores (50ghz). multi-threaded codes (non-vectorized) on current hw (dual sb) at 3.x ghz already have (16x3.x ghz >= 48ghz) on out-of-ordergraphicsMan wrote:My take is that it will be easy to get core-wise parallelism out of the chip. If your code scales well on CPU cores already, it will scale to the cores on Phi. However, without making good use of the SIMD lanes, I would guess your performance will not be any better than an 8-core Xeon chip. This requires using intrinsics (ugh), or a special language like opencl or one of the Intel special ones to make good use of the SIMD parts of the chip. I doubt it will be any easier to use than a GPU to get anywhere close to good chip utilization.
cores. when we also take into account that we will have a much higher sync. overhead + higher pressure on the system bus (io/mem ops) it cannot do better for any algos as on todays
up2date xeon servers. in practice we get maybe something on the level of a single cpu => vectorization is a must to see some progress. dont expect sse and avx codes to run out of the box...
and compilers can only do trivial stuff in terms of vectorization.
mp
Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ?
I just read that the Phi has 4 threads per core, or 200 threads, and runs at 1.05 GHz. Given your formula, I get (200 threads * 1.x GHz) / (16 threads * 3.x GHz) = 4X improvement. The rated 1.x TFLOPS is also very comparable to the K20, so I would expect the Phi to be competitive with it. When the second version comes out next summer with 60 cores for <US$2k, I may give it a try.mpeterson wrote: assume that it will run at 1ghz with ~ 50 inorder cores (50ghz). multi-threaded codes (non-vectorized) on current hw (dual sb) at 3.x ghz already have (16x3.x ghz >= 48ghz) on out-of-order cores.
mp
-
- Posts: 167
- Joined: Mon Nov 28, 2011 7:28 pm
Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ?
Only time will tell of course, but I suspect that the 4 threads per core will only serve to partially fix the fact that the cores are in-order. A 4x improvement factor is definitely way too optimistic 

Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ?
graphicsMan wrote:Only time will tell of course, but I suspect that the 4 threads per core will only serve to partially fix the fact that the cores are in-order. A 4x improvement factor is definitely way too optimistic
right, each core is 4x ht (to compensate for non-vectorized codes). when running with all threads together the scheduler and
the synchronization (atomics) becomes a bottleneck.
Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ?
And what about the memory accesses ? It seems to me that more we increase the number of cores... more we have "the same problems" than with GPUs !
-
- Posts: 167
- Joined: Mon Nov 28, 2011 7:28 pm
Re: Intel Xeon Phi Preferred Pricing - Only $400 per Card ?
That's one of the reasons for having 4 threads per core... hide memory access latency. However, 4 threads doesn't seem like nearly enough to accomplish that. GPUs have many warps to hide these costs.