I'd like to take this opportunity to advertise tool that might be very useful for developing ray tracers or any other
compute intensive applications on CPU. I develop package for python that allow you to utilize CPU SIMD instructions (SSE, AVX, AVX2, AVX-512, FMA).
Basically its JIT compiler that compiles simplified Python code to native x86 machine code. To use SIMD instructions I add
vector data types float32x4, float32x8, float32x16, etc... so you can easily do explicit vectorization. Before compilation I check what
instruction sets CPU supports and then select best one. So basically this means if you want to achieve maximum performance all that is needed
to do is to use biggest vector types supported. (float32x16, float64x8, int32x16) as much as possible and all the magic happens automatically.
Even if your CPU only has SSE instruction sets you still benefit from using wide vector types because of memory locality.
This tool is still WIP because there is still lots of work to be done but even at this stage it is very useful. I start developing
path tracer just to show how this tool is used.
Here is one trivial example (Calculation of pi using Monte Carlo)
Code: Select all
from multiprocessing import cpu_count
from simdy import int64, float64, simdy_kernel
@simdy_kernel(nthreads=cpu_count())
def calculate_pi(n_samples: int64) -> float64:
inside = int64x4(0)
for i in range(n_samples):
x = 2.0 * random_float64x4() - float64x4(1.0)
y = 2.0 * random_float64x4() - float64x4(1.0)
inside += select(int64x4(1), int64x4(0), x * x + y * y < float64x4(1.0))
nn = inside[0] + inside[1] + inside[2] + inside[3]
result = 4.0 * float64(nn) / float64(n_samples * 4)
return result
result = calculate_pi(int64(25_000_000))
print(sum(result) / cpu_count())
LINKS:
SIMDy - http://www.tahir007.com
Path Tracer - https://bitbucket.org/Tahir007/quark
My question for you guys is what you think about this tool?
When i transform BVH tree to BVH16 i got almost got two times performance with AVX-512 instructions.
