Aila Laine traversal kernel: memory layout wasteful?

Asic transit gloria mundi.
chsu
Posts: 3
Joined: Thu Sep 26, 2013 11:33 pm

Aila Laine traversal kernel: memory layout wasteful?

Postby chsu » Sat Nov 02, 2013 1:07 am

In noticed that in the purportedly fast Aila and Laine traversal code uses a triangle index look-up buffer with a consistent memory layout as it's leaf-buffer... it seems wasteful:

leaf buffer entry (48 bytes)
{
float4 (woop transform z)
float4 (woop transform x)
float4 (woop transform y)
}

index buffer entry (12 bytes)
{
int (triangle_index)
int unused
int unused
}

Since the leaf buffer is interpersed with terminating "-0" to signify the end of a leaf's triangles, this ensures that the leaf buffer index (leafAddr in their code) can be used to map to global triangle index.
I guess this is a question of: is this worth it? Is it worse to add in the triangle index into the leaf-buffer?

I'm also wondering why not use the unused space in the node entry to store the number of leaf primitives belonging to the node? This would remove the need for terminating values, and the need for index padding.

node buffer
{
float4 child0 bounding_box xy
float4 child1 bounding_box xy
float4 child01 bounding_box z
int4 child indices (z, w unused)
}

Well I guess I'll implement this way and find out :)

mpeterson
Posts: 52
Joined: Fri Jan 06, 2012 3:09 pm

Re: Aila Laine traversal kernel: memory layout wasteful?

Postby mpeterson » Mon Nov 04, 2013 10:40 am

hallo, i have seen also some strange/nonsense code sequences there. please
report improvements in terms of perf. and/or code-layout, mp.

chsu
Posts: 3
Joined: Thu Sep 26, 2013 11:33 pm

Re: Aila Laine traversal kernel: memory layout wasteful?

Postby chsu » Mon Nov 04, 2013 9:28 pm

I realize there's some reasoning behind using terminating sequences instead of storing child counts, and thus using a consistent memory layout with padding.

The reason is for speculative traversal and to save registers. Speculative traversal means that leaf intersection tests are postponed until each thread finds a leaf in the BVH, this is reported by their paper to increase SIMD coherency and thus performance by 5-10%. Postponing means saving the 2 leaf addresses (i.e. leafAddr and nodeAddr in their code). Using terminating sequences in global memory means not having to store 2 additional leaf-counts in the register.

jbikker
Posts: 175
Joined: Mon Nov 28, 2011 8:18 am
Contact:

Re: Aila Laine traversal kernel: memory layout wasteful?

Postby jbikker » Tue Nov 05, 2013 9:35 am

I am using the Aila and Laine method for a few years now, but instead of null-terminating the triangle list, I multiply each triangle index by 2 and use the lowest bit to mark the last triangle in a leaf. Same effect, but without extra data.

chsu
Posts: 3
Joined: Thu Sep 26, 2013 11:33 pm

Re: Aila Laine traversal kernel: memory layout wasteful?

Postby chsu » Tue Nov 05, 2013 9:37 pm

I'm guessing that means you store the index with the 16 float-woop triangles?

jbikker
Posts: 175
Joined: Mon Nov 28, 2011 8:18 am
Contact:

Re: Aila Laine traversal kernel: memory layout wasteful?

Postby jbikker » Wed Nov 06, 2013 1:02 pm

Yes, bvh leafs point to an entry in a list of triangle indices; every index is shifted to the left by 1 so the least significant bit can be used as a null terminator. That way a leaf only needs to store the index of the first index. Additionally, an index is always inverted (so it is always negative), which lets me mix child node addresses and indices; if a 'child address' is negative, it's the index of the first entry in the index list, otherwise it's a node address. Benefit of this is that leaf contents can be placed on the stack for later processing.


Return to “GPU”

Who is online

Users browsing this forum: No registered users and 1 guest