It is based on Linear BVH [2] and an idea of parallel construction of the tree [3].
I have finished implementing [2] and [3]. The result of rendering seems to be done correctly and the speed of construction is also reasonable.
However I have one doubt about the part of parallel construction.
In the proposed parallel construction for computing node's AABB, the paths from leaf nodes to the root node are processed in parallel.
Each thread starts from a leaf node and walks up the tree according to the information of parent's index.
When a thread visit an internal node, it increments an atomic counter assigned to the node.
If the resulting value of atomic increment is 1, the thread immediately terminated.
The other case, the thread compute the node's AABB using children indices.
By assigning the processing responsibility to the second thread visiting the node, it guarantees that the AABBs of subtree of the node are already computed.
I have a doubt about this.
Indeed, I think the computations of a subtree are done. However is writing AABB to global memory also done?
Let's consider a simple tree with 3 leaves.
Code: Select all
Q
/ \
P \
/ \ \
A B C
TImeline
thread A (leaf A) --> visit node P, terminated
thread B (leaf B) --> visit node P, compute AABB (A+B) and store result --> visit node Q and terminated
thread C (leaf C) --> visit node Q, read AABBs of P and C then compute AABB of Q
Thanks.
[1] "Fast Parallel Construction of High-Quality Bounding Volume Hierarchies".
[2] "Fast BVH Construction on GPUs"
[3] "Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees"