I have a few remarks to make. First of all, I think there is undoubtedly novelty in the data structure we present. In particular, I see two major differences with previous work and the papers you mention:

- We are using a hierarchical grid as an index ("brick map"). Previous approaches all use a uniform grid for that purpose. There however is a cost in using a hierarchical grid: Our virtual grid must have a resolution of the form K*(2^d), and doing a lookup in the virtual map is no longer one memory access, but a sequence of dependent memory accesses. The advantage is that our hierarchical grid allows us to easily reach grid resolutions of over 50000^3, whereas all of the previous approaches, if implemented on a modern GPU, would not be able to get beyond around 2048^3. A large margin of the performance gains we have come from this (see the section that compares against Macro-Regions in the paper).
- We allow cells to overlap each other. The work that is the closest in that respect is the Macro-Regions paper, where only empty cells can overlap. This, in turn, gives more freedom to choose a cell shape than any of the previous approaches. As a consequence, the expansion phase can provide 20% performance improvements over the already merged grid.

Even if you completely disregard the novel construction algorithm, the optimized data layout, and all of the experiments, there is enough novelty in those two points alone to claim the data structure is new. That may not be earthshakingly new, but it is undeniably new. Research rarely starts from scratch, and ideas are always around, even if they are developed independently.

Concerning your related work suggestions, as Javor said, we have to make a choice as to what to mention in the references. The intended audience is people working using ray tracing today. Most of the approaches you mention, while doing something somewhat similar, are more than 20 years old, and re-implementing them for comparison purposes would not be trivial because they were not designed to work on GPUs. We already did that effort for Macro-Regions, and the results speak for themselves: Performance is just not here (on a simple scene, like Sponza, Macro-Regions are around 4 times slower). There are plenty of other approaches that do something somewhat similar and we simply do not have the time/space to put each and every one of them in the paper. The choice of references we made might not be perfect, but we believe it positions our paper as a modern, GPU ray tracing article.

Statistics: Posted by madmann — Thu Jun 22, 2017 9:02 am

]]>

Statistics: Posted by szellmann — Wed Jun 21, 2017 1:48 pm

]]>

Statistics: Posted by stefan — Thu Jun 15, 2017 8:51 pm

]]>

Statistics: Posted by friedlinguini — Thu Jun 15, 2017 1:59 pm

]]>

Statistics: Posted by jbikker — Thu Jun 15, 2017 12:26 pm

]]>

Max Liani, and Ryusuke Villemin:

http://jcgt.org/published/0006/01/01/paper.pdf

It describes a fast and accurate way to generate an orthonormal basis for things like random sampling. I got the link from a Twitter post by Carmack some time ago but I don't remember seeing it linked here yet.

I watched the Hero presentation at EGSR a couple years back. The results looked really excellent.

Statistics: Posted by ziu — Thu Jun 15, 2017 11:38 am

]]>

]]>

What is the material?

Do you implement Hero wavelength spectral sampling?

Statistics: Posted by shocker_0x15 — Thu Jun 15, 2017 5:46 am

]]>

Completely new world for me so be gentle.

This is with tracing four wavelengths at once; three of them are killed when a wavelength dependent pure specular dielectric is encountered.

The tracer accumulates CIE XYZ colors and converts them to RGB using the approach proposed by Brian Smits (http://www.cs.utah.edu/~bes/papers/color/paper.pdf). I chose not to use the more accurate method of Meng et al., (https://jo.dreggn.org/home/2015_spectrum.pdf) as it requires substantially more involved calculations.

EDIT: another one:

This is unidirectional path tracing so the caustics took a while to converge, and they are still noisy...

Statistics: Posted by jbikker — Fri Jun 09, 2017 10:19 am

]]>

Whether or not the differences to previous methods (like the macro-regions by Devillers) justify the claim about the novelty of the structure in the way we formulated it, is another question. Seeing the reaction(s) I would have re-phrased the sentence.

Statistics: Posted by javor — Thu Jun 08, 2017 6:56 pm

]]>

javor wrote:

Following up the discussion from the "glitch pictures" thread:

As with the Nemoto & Omachi and the Macro-regions paper, the resulting acceleration structures are indeed very similar, but not the same. You certainly have a point criticizing how the paper is positioned, but our view on this is that the novelty is mainly in the combination of construction and traversal algorithms (and the resulting performance), not so much in the the geometric representation of the spatial index structures.

Following up the discussion from the "glitch pictures" thread:

As with the Nemoto & Omachi and the Macro-regions paper, the resulting acceleration structures are indeed very similar, but not the same. You certainly have a point criticizing how the paper is positioned, but our view on this is that the novelty is mainly in the combination of construction and traversal algorithms (and the resulting performance), not so much in the the geometric representation of the spatial index structures.

Then why do you say this in Section 1, Paragraph 2, of your paper?

In this paper, we demonstrate that efficient ray traversal on modern GPUs is also possible with a new type of acceleration structure, which combines characteristics of hierarchical grids and bounding volume hierarchies.

To me it looks like you guys clearly thought the acceleration structure itself was novel. The acceleration structure is EXCELL so I think that paper should be referenced or, at minimum, acknowledged in some fashion. The traversal algorithm you guys use IMHO is also similar to prior EXCELL algorithms. What is novel in your work IMHO is the construction method plus the fact it's optimized for a GPU architecture. I quote the "Adaptive cell division for ray tracing" (1991), B. Pradhan, A. Mukhopadhyay paper I referenced before in the other thread because I think it makes this more explicit:

2. PROPOSED DATA STRUCTURE

We assume that the entire object space is uniformly divided into cells by a three-dimensional "virtual" grid of predefined resolution. We call this grid virtual because the cells are not occupied by the objects in object space but by integer entries which guide the search through object space. The virtual grid is represented by a three-dimensional integer array with each cell represented by an element of the array. The space subdivision is done in the following manner: At the beginning the entire object space is enclosed within a box called the root-box on which we superimpose the virtual grid. ... Fig. 1 illustrates this subdivision in two dimensions ... In the case of quadtree division (two-dimensional version of oct-tree division) this box will be divided into four uniform quadrants ... Now notice that each box contains an integral number of cells along both the x- and y-axes. Thus, every cell is contained in exactly one box. We mentioned earlier that the virtual grid is an integer array. This is because the array element corresponding to a cell is filled with identification number of the box which contains the cell. ... The data structures for cell division described in the previous section are shown in Fig. 3. It shows the box-array, an array of pointers to box structures created during subdivision. A box is entered into this array when it can no longer be subdivided. A box is uniquely identified by an index of the array. ... Each box consists of cells which contain the index of this box structure in the box-array.

theirs:

yours:

3. TRAVERSING THE DATA STRUCTURE

Traversing the data structure by a ray is reduced to traversing the virtual grid in object space. When a ray is fired, the first point in which it intersects the root-box is calculated. From this point onward the number of cell units traversed by the ray in both the x-and y-directions are maintained. Since cells are of uniform size, this can be obtained by dividing the actual distance traversed in x- and y-directions by respective cell dimensions in those directions. These cell units need not be whole numbers, since a ray may happen to cross a fraction of a cell. But the integer portions of the cell units yield indices of the cell the ray is currently traversing. Using these to index into the virtual grid we get the number of the box the ray is traversing. ... To find the next box, first we have to obtain the point at which the ray leaves the present box. Next, we have to calculate a point slightly away from the previous point along the direction of the ray. Glassner [2] employs a method in which the ray is intersected with all the six planes of the current box, and the nearest plane or planes are found by determining the smallest t values. This method consumes nine multiplication and nine addition operations, assuming that division and subtraction are equivalent to multiplication and addition, respectively. Lee [5] uses a target plane method in which 5 multiplications and 6 additions take place on average. But, in the worst case, this is slower than the Glassner's method. We employ another different method which is faster than the above two methods. We start by observing that any ray can potentially intersect, at most, three of the six planes of the current box, at the point where it leaves the box. We classify these six planes as lower x- plane, higher x-plane, lower y-plane, higher y-plane, lower z-plane, and higher z-plane. After determining the ray source u and the ray vector v, we find which three of these planes will possibly be intersected by the ray with smallest t by examining its ray vector. For example, if v has a negative x component then the lower x-plane, but not the higher x-plane, is to be tested for intersection. This method consumes five multiplications and six additions in the worst case, which is equal to the average case performance of Lee's method [5].

It looks to me like this describes the method you guys use at:

https://github.com/madmann91/hagrid/blo ... raverse.cu

This is also described in the paper "A new algorithm of space tracing using a CSG model" (1987), Kadi Bouatouch, Mohamed Ouali Madani, Thierry Priol, Bruno Arnaldi.

Still think it's a waste of time to read the prior work? Shit, now you made me do *your* work. But this was fun because I found a lot of other papers about something else that I'm currently working on as a result so it's good.

AFAIK your construction algorithm *is* novel compared to prior published EXCELL work, plus it uses SAH, and it is implemented on a GPU, I have no doubt that you guys ended up reinventing a lot of this independently (it wouldn't be the first time it happened in the field; the grid voxel traversal algorithm was even independently conceived by at least 3 teams in the 1980s at roughly the same time frame in different areas of the globe!). But that's no excuse not to quote the prior art in the future now that you know that it exists. At least that's my POV. Still, it doesn't detract from your major achievement, like I said before.

Statistics: Posted by ziu — Thu Jun 08, 2017 5:05 pm

]]>

ziu wrote:

But it is essential that you guys reference the EXCELL (Tamminen et al.) data structure papers in future publications. It is the exact same acceleration structure even if the construction method might not be the same. ...

As with the Nemoto & Omachi and the Macro-regions paper, the resulting acceleration structures are indeed very similar, but not the same. You certainly have a point criticizing how the paper is positioned, but our view on this is that the novelty is mainly in the combination of construction and traversal algorithms (and the resulting performance), not so much in the the geometric representation of the spatial index structures.

Statistics: Posted by javor — Thu Jun 08, 2017 12:20 pm

]]>

]]>

We also made the CUDA source code from our paper available, in the MIT license, on Github here in case you're interested: https://github.com/vcosta/cuda-grid . In case you want to understand the algorithm just look at Figures 2,3,4 in our paper: https://www.academia.edu/13928983/Effic ... hitectures . It's totally parallel and uses no atomics whatsoever.

Statistics: Posted by ziu — Wed Jun 07, 2017 3:55 am

]]>

javor wrote:

The Nemoto & Omachi and Taranta et al. citations are certainly an oversight on our side, thanks for pointing this out. I was not aware of the first paper, and have no excuse for forgetting to cite the Macro/Micro regions papers. When writing the paper, we considered positioning the paper only as a grid-based ray tracing method, but decided against that. Personally, I think that pointing the reader towards the state-of-the-art ray tracing approaches on current hardware is more important than differentiating our method from algorithms tested so far back in time. If it was up to me, we would also leave out the comparison with macro-regions, which was forced on us by the primary reviewer. We are aware that a multitude of similar approaches exist and have been tried before. The important point is that the particular combination of techniques we implemented delivers surprisingly good performance from a type of acceleration structure, which, as you said, does not see a lot of attention.

The Nemoto & Omachi and Taranta et al. citations are certainly an oversight on our side, thanks for pointing this out. I was not aware of the first paper, and have no excuse for forgetting to cite the Macro/Micro regions papers. When writing the paper, we considered positioning the paper only as a grid-based ray tracing method, but decided against that. Personally, I think that pointing the reader towards the state-of-the-art ray tracing approaches on current hardware is more important than differentiating our method from algorithms tested so far back in time. If it was up to me, we would also leave out the comparison with macro-regions, which was forced on us by the primary reviewer. We are aware that a multitude of similar approaches exist and have been tried before. The important point is that the particular combination of techniques we implemented delivers surprisingly good performance from a type of acceleration structure, which, as you said, does not see a lot of attention.

Hey Javor,

Well some of the references I mentioned are obscure and rather dated. So I can certainly understand how they could have been overlooked.

But it is essential that you guys reference the EXCELL (Tamminen et al.) data structure papers in future publications. It is the exact same acceleration structure even if the construction method might not be the same. You guys quote already Cleary and McDonald and Booth which are also quite old and IMHO less apropos. So it doesn't make sense to ignore the references of the guys who actually describe the same data structure. I've searched the literature some more afterwards. Jensen mentions using EXCELL in the early 1980s in the RTNews archives and it seems to have been fairly commonly used back then (e.g. it's also referenced in Bouatouch's papers). For whatever reason both later favored kd-tree data structures. My guess is this was due either to hardware limitations back then or due to improvements to kd-tree SAH methods which made them faster. For example, regarding hardware, back in the 1980s it was common to use an hashtable to store the grid due to memory limitations but today quite often we just use a plain array because memory is much more plentiful and pointer chasing degrades performance. Back then the memory wall barrier characteristics were a lot different. Hardware memory latency was comparatively better versus instruction latency so the algorithms reflect that.

javor wrote:

The construction speed is indeed the part where we expect to see improvements in the future, however, I would look elsewhere for optimization opportunities. Overall the initial two-level grid construction is the fastest part of the build process, and most of the time is spent merging and expanding cells. Also, the first level grid we create is too sparse for the suggested technique to work, which is why Arsene did not include it in the implementation. Of course, we are happy to be proven wrong!

Well in that case you guys would indeed be better off looking at the kd-tree (morton) or macrocell related literature for improvements in build times. But looking at Figure 5 in your paper you spend like 40% of construction time building the initial grid on average. San Miguel in particular has a lot of dissimilar sized triangles. In our experience this can be done up to 9x faster for such scenes. It's a shame we can't compare your build times with a state of the art high-quality GPU BVH or kd-tree like (Karras and Aila 2013) since their code is AFAIK not publicly available. Still it seems to me, from reading your paper, that you guys have achieved state of the art rendering performance, which is the main focus of your algorithms. So kudos for that!

So it's interesting to see you guys show it's possible to have good rendering rates with a grid based acceleration structure. I suspected as much from looking at the late 1980/early 1990s literature but I couldn't convince people on arguments alone. The thing is, what burned a lot of people back then, was that hierarchical grids have a lot of edge cases where the memory consumption exploded or the performance didn't go up no matter how much you increased the grid resolution so it kind of put people off grids for ever. Typically this involved scenes with nested geometry akin to the Hairball scene we use today. The Standard Procedural Databases "Shells" scene is the example typically used back then.

Statistics: Posted by ziu — Wed Jun 07, 2017 12:22 am

]]>