Multiple Textures CUDA/OpenCL

Practical and theoretical implementation discussion.
atlas
Posts: 26
Joined: Thu Apr 16, 2015 12:01 am

Multiple Textures CUDA/OpenCL

Postby atlas » Fri Dec 02, 2016 11:29 am

How do you guys handle large numbers of textures on the GPU? You can either pass them all in as separate arguments with a huge switch/if statement, or you can pack them into image arrays or 3d textures (which wastes memory). Lastly, you could pack them into a large texture atlas, which may be the best method.

Which is the smarter method for speed and memory?

shocker_0x15
Posts: 61
Joined: Sun Aug 19, 2012 3:24 pm

Re: Multiple Textures CUDA/OpenCL

Postby shocker_0x15 » Thu Dec 08, 2016 1:44 am

Hi,

If we create a global illumination renderer like path tracing, using a dedicated texture type seems not plausible because we cannot know which texture is used and a lot of textures should be bound into kernel.

In my OpenCL renderer, I used a single uchar* argument of a kernel as a pointer to texture storage.
The renderer samples a texture at an index specified by a material descriptor (passed as another argument).

In this way, it is impossible to use HW accelerated texture filtering (but I think view dependent texture filtering makes the renderer biased/inconsistent, so simple bilinear filtering is only valid).
Additionally I don't know if this approach is good at performance (maybe not so good).

However this approach seems the most generic way.
Last edited by shocker_0x15 on Thu Dec 08, 2016 11:40 am, edited 2 times in total.

szellmann
Posts: 44
Joined: Fri Oct 10, 2014 9:15 am
Contact:

Re: Multiple Textures CUDA/OpenCL

Postby szellmann » Thu Dec 08, 2016 10:37 am

From CUDA 6.0 or so onwards, and on Kepler and newer, you can use texture objects. I'd guess that they are quite efficient, just as bindless textures in OpenGL are. You can have a dynamic number of texture objects / can pass a list (e.g. thrust::device_vector) of texture objects to the kernel.

For legacy architectures or with OpenCL I'd probably go with a texture atlas and references into that. I however decided to limit my ray tracing lib to Kepler+ just because of the availability of texture objects.

atlas
Posts: 26
Joined: Thu Apr 16, 2015 12:01 am

Re: Multiple Textures CUDA/OpenCL

Postby atlas » Fri Dec 09, 2016 7:55 pm

shocker_0x15: I don't understand, I thought in OpenCL you had to use an image2d_t type in order to get hardware filtering/sampling. How are you getting filtering using a global char*?

szellmann: yes, for new hardware that makes the most sense. I have to support old hardware, so I've gone with the texture atlas option and it works well. Kind of a pain packing textures and keeping up with offsets, but the solution is efficient in the kernel.

ultimatemau
Posts: 11
Joined: Wed Jan 08, 2014 9:48 am

Re: Multiple Textures CUDA/OpenCL

Postby ultimatemau » Sun Dec 11, 2016 11:58 am

"I don't understand, I thought in OpenCL you had to use an image2d_t type in order to get hardware filtering/sampling. How are you getting filtering using a global char*"

I think the idea is that the __global char* textureData is just one huge array of all texels for all textures.

The idea with __global char* textureData is the following. You would usually have something like a TextureHeader struct "__global TextureHeader* textureHeaders" which holds some information about the texture like: width, height, offset, textureType, filteringType, etc. Then, some material or shader stores the index to the correct TextureHeader.

At runtime in your kernel you would get something like:
TextureHeader texHeader = textureHeaders(shader.texid); // texid is the offset to the correct texture header which provides with all the data we need.
TextureType texType = texHeader.type; // Let's say it was a RGBA float i.e. float4
float2 texDimensions = make_float2(texHeader.width, texHeader.height);
__global float4* myTexture = (__global float4*)(textureData[textureHeader.offset]); // Here's the float4*

With myTexture you can do anything you like. This texture loading code goes wherever you want it do go e.g. functions with switches for filterType, textureType, etc. For a (bi/uni) directional path tracer it is benificial to sort the materials / hitpoints to get optimal performance i.e. utilization and cache coherence with texture lookups. In my personal experience the __global char* approach is very fast even if it's not hardware accelerated.

Cheers

shocker_0x15
Posts: 61
Joined: Sun Aug 19, 2012 3:24 pm

Re: Multiple Textures CUDA/OpenCL

Postby shocker_0x15 » Mon Dec 12, 2016 2:16 am

Thank you ultimatemau.
It is exactly what I want to say.

szellmann
Posts: 44
Joined: Fri Oct 10, 2014 9:15 am
Contact:

Re: Multiple Textures CUDA/OpenCL

Postby szellmann » Mon Dec 12, 2016 8:17 am

In my personal experience the __global char* approach is very fast even if it's not hardware accelerated.


I'm curious, what difference in latency do you actually observe when comparing nearest neighbor vs. linear filtering.

My rt lib has a dedicated CPU API where you can call texture access "intrinsics" like tex2D, etc., which emulate CUDA behavior. So I did the comparison once, both with CUDA, and on the CPU. I compared 3D texture access (so 1x mem access for nearest, 8x mem access for linear!) I remember sth. along the lines of the CPU linear filter being 5x slower than nearest filtering. In contrast to that, On the GPU, with HW accelerated filtering, I found there was virtually no difference in performance between nearest and linear filtering. So I'm curious how emulating textures in GPU DDR memory, which is basically what you propose, compares to that.

Cheers, Stefan

friedlinguini
Posts: 79
Joined: Thu Apr 11, 2013 5:15 pm

Re: Multiple Textures CUDA/OpenCL

Postby friedlinguini » Mon Dec 12, 2016 2:58 pm

szellmann wrote: I remember sth. along the lines of the CPU linear filter being 5x slower than nearest filtering. In contrast to that, On the GPU, with HW accelerated filtering, I found there was virtually no difference in performance between nearest and linear filtering.


How did you lay out your textures in memory? It probably wouldn't make much of a difference if you're performing nearest-neighbor sampling incoherently. But if you're performing linear filtering, then I would expect tiling and/or swizzling textures would make the memory fetches more coherent for a single texture sample (at the cost of more computation).

szellmann
Posts: 44
Joined: Fri Oct 10, 2014 9:15 am
Contact:

Re: Multiple Textures CUDA/OpenCL

Postby szellmann » Mon Dec 12, 2016 3:46 pm

I used tiles/bricks, used morton curves and my texturing functions also support SoA access w/ SSE and AVX. Of course the outcome of the tests depends on overall coherence. Baseline was that linear filtering had a significant impact when texturing was emulated, while there was virtually no difference with HW support.


Return to “General Development”

Who is online

Users browsing this forum: No registered users and 3 guests