SIMDy

Show-off, reference material & tools.
Tahir007
Posts: 3
Joined: Mon Jun 30, 2014 1:20 pm

SIMDy

Postby Tahir007 » Mon Nov 06, 2017 12:45 am

Hi there,

I'd like to take this opportunity to advertise tool that might be very useful for developing ray tracers or any other
compute intensive applications on CPU. I develop package for python that allow you to utilize CPU SIMD instructions (SSE, AVX, AVX2, AVX-512, FMA).
Basically its JIT compiler that compiles simplified Python code to native x86 machine code. To use SIMD instructions I add
vector data types float32x4, float32x8, float32x16, etc... so you can easily do explicit vectorization. Before compilation I check what
instruction sets CPU supports and then select best one. So basically this means if you want to achieve maximum performance all that is needed
to do is to use biggest vector types supported. (float32x16, float64x8, int32x16) as much as possible and all the magic happens automatically.
Even if your CPU only has SSE instruction sets you still benefit from using wide vector types because of memory locality.
This tool is still WIP because there is still lots of work to be done but even at this stage it is very useful. I start developing
path tracer just to show how this tool is used.

Here is one trivial example (Calculation of pi using Monte Carlo)

Code: Select all

from multiprocessing import cpu_count
from simdy import int64, float64, simdy_kernel

@simdy_kernel(nthreads=cpu_count())
def calculate_pi(n_samples: int64) -> float64:
    inside = int64x4(0)
    for i in range(n_samples):
        x = 2.0 * random_float64x4() - float64x4(1.0)
        y = 2.0 * random_float64x4() - float64x4(1.0)
        inside += select(int64x4(1), int64x4(0), x * x + y * y < float64x4(1.0))

    nn = inside[0] + inside[1] + inside[2] + inside[3]
    result = 4.0 * float64(nn) / float64(n_samples * 4)
    return result

result = calculate_pi(int64(25_000_000))
print(sum(result) / cpu_count())


LINKS:

SIMDy - http://www.tahir007.com
Path Tracer - https://bitbucket.org/Tahir007/quark


My question for you guys is what you think about this tool?

When i transform BVH tree to BVH16 i got almost got two times performance with AVX-512 instructions. :)
Attachments
cornell_10k.png
cornell_10k.png (245.78 KiB) Viewed 519 times

mpeterson
Posts: 52
Joined: Fri Jan 06, 2012 3:09 pm

Re: SIMDy

Postby mpeterson » Tue Nov 07, 2017 9:21 am

sorry, but absolutely useless. the n+1 invocation of an "auto-vectorizer" ... and pathon ? what is it really good for ?

Tahir007
Posts: 3
Joined: Mon Jun 30, 2014 1:20 pm

Re: SIMDy

Postby Tahir007 » Tue Nov 07, 2017 9:46 am

I don't get what you mean by this
"the n+1 invocation of an "auto-vectorizer" ... and python ?"

Idea is that using just Python + SIMDy you get similar performance as if you program in C++.
In above project I develop path tracer using just Python + SIMDy that easily outperform C++ implementations.

graphicsMan
Posts: 156
Joined: Mon Nov 28, 2011 7:28 pm

Re: SIMDy

Postby graphicsMan » Tue Nov 07, 2017 5:24 pm

Does it generate object files? IMO, this is pretty neat. It would be cool to write kernels using python, and then link those into C++ code.

graphicsMan
Posts: 156
Joined: Mon Nov 28, 2011 7:28 pm

Re: SIMDy

Postby graphicsMan » Tue Nov 07, 2017 5:26 pm

NM, re-reading, it is clear that it doesn't generate object code. I think it's a nifty project, and probably a good way to learn stuff, but I think if you spend effort writing optimized C++, I'd be very surprised to see this perform similarly.

Tahir007
Posts: 3
Joined: Mon Jun 30, 2014 1:20 pm

Re: SIMDy

Postby Tahir007 » Tue Nov 07, 2017 8:55 pm

Thanks for positive opinions about project:)

Yes you are right when you sad that you be very surprised if this was fast as optimized C++. I am programming about 15 years now and on numerous occasions i tried to optimize some function with hand written assembly code and compiler always beat me, but I learn lot in the process. Over the years a got better in assembly but still i admit that C++ compilers generates better code than I am. But when you turn to SIMD instructions things
are suddenly changed. Now programmer is responsible for writing compiler SIMD intrinsic so now I compete with other programmers and not compiler.
And also because I am doing JIT compilation i have lot's more context to work with because I know exactly what CPU you have. So in the and its not clear which code will be faster that why I sad that you get similar performance as optimized C++. :)
Now i will show simple example just to see exactly what is going on and how SIMDy works. Below example is trivial but it will show one of biggest advantage of SIMDy and that is how it adapt to different instruction sets automatically, depend of you CPU capabilities for handling float64x8 data type AVX-512, AVX2, AVX or SSE will be used. Best thing here is that programmer does't care about your CPU is just works. Even if your CPU
have only SSE instruction you still benefit from float64x8 type because of memory locality. Hint: for best performance always use float64x8 :)
Here I put explicitly AVX-512 as preferred instruction set because currently default is AVX2 but this will be fixed in next version and default will
be AVX-512.


Code: Select all

from simdy import Kernel, float64x8, ISet

source = """
a = b * c + float64x8(2.0)
"""
args = [('a', float64x8()), ('b', float64x8()), ('c', float64x8())]
# I forgot to put AVX512 as default :x but this will be fixed in next version
k = Kernel(source, args=args, iset=ISet.AVX512)  # if you dont set iset default is AVX2

# put some values for parameters of kernel
k.set_value('b', float64x8(2.0))
k.set_value('c', float64x8(3.0))

k.run()
print(k.get_value('a'))

# you can of course inspect assembly code if you want
print(k.asm)


Yes you can write kernels in Python and use it from C++ but in that case you must embed Python in your project and use it from there.
Communication between Python and C++ can be in both directions, people usually are not aware of this. :)


Return to “Tools, Demos & Sources”

Who is online

Users browsing this forum: No registered users and 1 guest