Page 1 of 2

Brigade footage

Posted: Mon Nov 28, 2011 5:41 pm
by admin
An empty forum is a depressing sight. Allow me to open with some Brigade footage:

Image

http://www.youtube.com/embed/0U0hDb7dDFs

This version of Brigade has been renamed to Brigade 2 due to an engine overhaul and new path tracing kernels. We now support the Blinn microfacet brdf and multiple importance sampling, resulting in lower variance at low sampling rates. We are applying this to a student game at the moment; I hope to have some footage shortly. The Brigade 1 game 'Reflect' is being ported to Brigade 2 to make full use of the faster path tracing kernels. Once completed, this game should be properly released.

- Jacco.

Re: Brigade footage

Posted: Mon Nov 28, 2011 5:50 pm
by sirpalee
What did you change in the kernels?

Can we have a look at the source code? There might be some interesting bits and pieces for most of us ;)

Re: Brigade footage

Posted: Mon Nov 28, 2011 6:08 pm
by admin
Certainly. We started out with a 'reference path tracer'. This one is as basic as possible, and is designed only to produce the 'correct image'. We compare the other kernels against this one.

Code: Select all

extern "C" __global__ void TracePixelReference()
{
	// setup path
	const int numRays = context.width * context.height;
	const int idx0 = threadIdx.y + blockDim.y * (blockIdx.x + gridDim.x * blockIdx.y) + ((context.firstline * context.width) >> 5);
	const int tx = threadIdx.x & 7, ty = threadIdx.x >> 3, tilesperline = context.width >> 3;
	const int xt = idx0 % tilesperline, yt = idx0 / tilesperline, px = (xt << 3) + tx, py = (yt << 2) + ty;
	const int pidx = numRays - 1 - (px + py * context.width);
	RNG genrand( pidx, (clock() * pidx * 8191) ^ 140167 );
	const int spp = context.SampleCount;
	const float rcpw = 1.0f / context.width, u = (float)px * rcpw - 0.5f, v = (float)(py + (context.width - context.height) * 0.5f) * rcpw - 0.5f;
	float3 E = make_float3( 0, 0, 0 );
	// trace path
	for( int sample = 0; sample < spp; sample++ )
	{
		// construct primary ray
		const float r1 = genrand(), r2 = genrand(), r3 = genrand(), r4 = genrand();
		float3 O = context.Translation + context.Left * context.LensSize * (r3 - 0.5f) + context.Up   * context.LensSize * (r4 - 0.5f);
		float3 D = context.Translation + (context.Forward + context.Left * context.tanFOV2 * (u + rcpw * r1) + context.Up * context.tanFOV2 * (v + rcpw * r2)) * context.FocalDist;
		D = normalize( D - O );
		// trace path
		float3 throughput = make_float3( 1, 1, 1 );
		int depth = 0;
		while (1)
		{
			int prim = 0; float2 BC, UV = make_float2( 0, 0 ); float dist = 1000000; bool backfaced = false;
			intersect<false,true>( O, D, dist, BC, prim, backfaced );
			O += D * dist;
			if (prim == -1) 
			{
				E += throughput * GetSkySample( D );
				break;
			}
			Triangle& tri = context.Triangles[prim];
			TracerMaterial mat = context.Materials[tri.GetMaterialIdx()];
			if (mat.flags & TracerMaterial::EMITTER) // light
			{
				E += throughput * mat.EmissiveColor; 
				break;
			}
			else // diffuse reflection
			{
				const float3 matcol = tri.GetMaterialColor( mat, BC, UV );
				const float3 N = tri.GetNormal( mat, BC, UV ) * (backfaced ? -1 : 1 );
				D = normalize( RandomReflection( genrand, N ) ); // don't even trust DiffuseReflection
				throughput *= matcol * dot( D, N );
			}
			O += D * EPSILON; // prevent intersection at dist = 0
			depth++;
			if (depth > 3) { if (genrand() > 0.5f) break; throughput *= 2.0f; }
		}
	}
	context.RenderTarget[pidx] = make_float4( E / (float)spp, 1 );
}
I am pasting this as-is; there's some Brigade-specific stuff in there as well as some dependencies, so ask if anything is unclear.

Then, we have a loop based on Novak's ideas. Instead of a path budget, it has a 'segment budget'. Paths are restarted when terminated, to keep the SIMT lanes occupied. Restarting is cheap, but happens typically in only a few threads in a warp. Here's the code:

Code: Select all

#define TERMINATE { restart = true; continue; }
extern "C" __global__ void TracePixelSegment()
{
	// setup path
	const int numRays = context.width * context.height;
	const int idx0 = threadIdx.y + blockDim.y * (blockIdx.x + gridDim.x * blockIdx.y) + ((context.firstline * context.width) >> 5);
	const int tx = threadIdx.x & 7, ty = threadIdx.x >> 3, tilesperline = context.width >> 3;
	const int xt = idx0 % tilesperline, yt = idx0 / tilesperline, px = (xt << 3) + tx, py = (yt << 2) + ty;
	const int pidx = numRays - 1 - (px + py * context.width);
	RNG genrand( pidx, (clock() * pidx * 8191) ^ 140167 );
	const int spp = context.SampleCount;
	const float rcpw = 1.0f / context.width, u = (float)px * rcpw - 0.5f, v = (float)(py + (context.width - context.height) * 0.5f) * rcpw - 0.5f;
	float3 E = make_float3( 0, 0, 0 ), throughput, O, D;
	bool restart = true, firsthit = true;
	int paths = 0, curdepth = 0;
	// trace path
#ifdef PURIST
	for( int segment = 0; ((segment < spp * 2) || (!restart)); segment++ )
#else
	for( int segment = 0; (segment < spp * 2); segment++ )
#endif
	{
		if (restart)
		{
			// construct primary ray
			const float r1 = genrand(), r2 = genrand(), r3 = genrand(), r4 = genrand();
			O = context.Translation + context.Left * context.LensSize * (r3 - 0.5f) + context.Up   * context.LensSize * (r4 - 0.5f);
			D = context.Translation + (context.Forward + context.Left * context.tanFOV2 * (u + rcpw * r1) + context.Up * context.tanFOV2 * (v + rcpw * r2)) * context.FocalDist;
			D = normalize( D - O );
			firsthit = true, restart = false, throughput = make_float3( 1, 1, 1 ), curdepth = 0, paths++;
		}
		// trace path segment
		int prim = 0; float2 UV, BC; float dist = 1000000; bool backfaced = false;
		O += D * EPSILON; // prevent intersection at dist = 0
		intersect<false,true>( O, D, dist, BC, prim, backfaced );
		O += D * dist;
		if (prim == -1)
		{
			// path left scene
			E += throughput * GetSkySample( D );
			TERMINATE;
		}
		Triangle& tri = context.Triangles[prim];
		TracerMaterial mat = context.Materials[tri.GetMaterialIdx()];
		if (mat.flags & TracerMaterial::EMITTER)
		{
			// path arrived at light
			if (firsthit & (!backfaced)) E += throughput * mat.EmissiveColor;
			TERMINATE;
		}
		const float3 matcol = tri.GetMaterialColor( mat, BC, UV );
		const float3 N = tri.GetNormal( mat, BC, UV ) * (backfaced ? -1 : 1 );
		const float3 wo = D * -1.0f;
		// sample direct lighting using next event estimation (FLAWLESS)
		float3 L, LN, LColor;
		const float r8 = genrand();
		float area;
		RandomPointOnLight( L, LN, LColor, r8, genrand, area );
		L -= O;
		float sqdist = dot( L, L ), ldist = sqrtf( sqdist );
		L *= 1.0f / ldist;
		const float NdotL = dot( N, L ), LNdotL = -dot( LN, L );
		if ((NdotL > 0) && (LNdotL > 0))
		{
			bool backface; int sprim; float2 SBC; ldist -= 2 * EPSILON;
			intersect<true,false>( O + L * EPSILON, L, ldist, SBC, sprim, backface );
			if (sprim == -1)	
			{
				const float lightPdf = (LNdotL > EPSILON) ? (sqdist / (LNdotL * area * context.lightcount)) : 0.0f;
				if (lightPdf > 0) E += throughput * matcol * INVPI * 0.5f * LColor * NdotL / lightPdf;
			}
		}
		// russian roulette
		if (curdepth > 1)
		{
			const float p = max( EPSILON, min( 0.5f, (throughput.x + throughput.y + throughput.z) * 0.333f ) ); // condition taken from pbrt
			if (genrand() > p) TERMINATE;
			throughput /= p;
		}
		// do a lambert reflection (FLAWLESS)
		const float r6 = genrand(), r7 = genrand();
		D = DiffuseReflection( r6, r7, genrand, N );
		const float bsdfPdf = LambertPdf( D, N );
		const float3 f = matcol * INVPI * 0.5f;
		if (bsdfPdf < EPSILON) TERMINATE;
		throughput *= f * dot( D, N ) / bsdfPdf;
		firsthit = false;
		curdepth++;
	}
	context.RenderTarget[pidx] = make_float4( E * (1.0f / (float)paths), 1.0f );
}
Not much to explain, but I would like to point out how simple an actual implementation of Novak's ideas can be. There's no admin code here, and this runs very efficient. Only problem is that segments of different depths are traced simultaneously, which reduces 'ray coherence', which is something you ideally want to have even on the GPU. The last started path is completed even when the segment budget is depleted; this is to prevent bias. If we don't care about bias, we can make the path tracer 30% or so faster by skipping this. ('Purist').

Then, the MIS code. Not much new here, except that direct light is now sampled by two rays: one directly to the light, and a 'random bounce' based on the brdf. Because the second ray tends to hit other geometry than the light, we reuse this ray for the diffuse bounce. MIS is thus spread over two loop iterations. This complicates the loop somewhat.

Code: Select all

extern "C" __global__ void TracePixelMIS()
{
	// setup path
	const int idx0 = threadIdx.y + blockDim.y * (blockIdx.x + gridDim.x * blockIdx.y) + ((context.firstline * context.width) >> 5);
	const int tx = threadIdx.x & 7, ty = threadIdx.x >> 3, tilesperline = context.width >> 3;
	const int xt = idx0 % tilesperline, yt = idx0 / tilesperline;
	int px = (xt << 3) + tx, py = (yt << 2) + ty;
	const int pidx =(px + py * context.width);
	px = context.width - px;
	py = context.height - py;
	RNG genrand( pidx, (clock() * pidx * 8191) ^ 140167 );
	const int spp = context.SampleCount;
	const float rcpw = 1.0f / context.width, u = (float)px * rcpw - 0.5f, v = (float)(py + (context.width - context.height) * 0.5f) * rcpw - 0.5f;
	float3 E = make_float3( 0, 0, 0 ), throughput, O, D, postComp, postThrough;
	bool restart = true, firsthit = true, postponed = false;
	float postPdf;
	int rays = 0;
	float3 lastabsorbance;
	int paths = 0, curdepth = 0;
	// trace path
#ifdef PURIST
	for( int segment = 0; ((segment < spp * 2) || (!restart)); segment++ )
#else
	for( int segment = 0; segment < (spp * 2); segment++ )
#endif
	{
		if (restart)
		{
			// construct primary ray
			const float r1 = genrand(), r2 = genrand(), r3 = genrand(), r4 = genrand();
			O = context.Translation + context.Left * context.LensSize * (r3 - 0.5f) + context.Up   * context.LensSize * (r4 - 0.5f);
			D = context.Translation + (context.Forward + context.Left * context.tanFOV2 * (u + rcpw * r1) + context.Up * context.tanFOV2 * (v + rcpw * r2)) * context.FocalDist;
			D = normalize( D - O );
			lastabsorbance = make_float3(0, 0, 0);
			firsthit = true, restart = false, postponed = false, throughput = make_float3( 1, 1, 1 ), curdepth = 0, paths++;
		}
		// trace path segment
		int prim = 0; float2 UV, BC; float dist = 1000000; bool backfaced = false;
		O += D * EPSILON; // prevent intersection at dist = 0
		intersect<false,true>( O, D, dist, BC, prim, backfaced ); rays++;
		O += D * dist;
		if (prim == -1)
		{
			// path left scene
			E += throughput * GetSkySample( D );
			TERMINATE;
		}
		if (lastabsorbance.x || lastabsorbance.y || lastabsorbance.z)
		{
			throughput *= make_float3(
				__expf(lastabsorbance.x * -dist),
				__expf(lastabsorbance.y * -dist),
				__expf(lastabsorbance.z * -dist));
			lastabsorbance = make_float3(0, 0, 0);
		}
		Triangle& tri = context.Triangles[prim];
		const TracerMaterial mat = context.Materials[tri.GetMaterialIdx()];
		if (mat.flags & TracerMaterial::EMITTER)
		{
			// path arrived at light
			if (postponed)
			{
				if ((mat.EmissiveColor.x == postComp.x) && (mat.EmissiveColor.y == postComp.y) && (mat.EmissiveColor.z == postComp.z))
				{
					const float den = (tri.area * context.lightcount * -dot( tri.GetNormal(), D ));
					const float lightPdf = (den > 0) ? ((dist * dist) / den) : 0.0f;
					if (lightPdf > 0)
					{
						const float weight = PowerHeuristic( postPdf, lightPdf );
						E += postThrough * mat.EmissiveColor * weight / postPdf;
					}
				}
				postponed = false;
			}
			if (firsthit & (!backfaced)) E += throughput * mat.EmissiveColor;
			TERMINATE;
		}
		const float3 matcol = tri.GetMaterialColor( mat, BC, UV );
		const float3 N = tri.GetNormal( mat, BC, UV ) * (backfaced ? -1 : 1 );
		if (mat.Specularity > 0)
		{
			if (!mat.Absorbance) throughput *= matcol;
			// handle pure specular materials and dielectrics
			if (mat.Transparency > genrand())
			{
				// dielectric
				float nt = mat.RefractionIndex;
				if (backfaced) nt = 1.0f / nt;
				const float nnt = 1.0f / nt, ddn = dot( D, N );
				const float cos2t = 1 - nnt * nnt * (1 - ddn * ddn);
				if (cos2t < 0) D = reflect( D, N ); /* TIR */ else
				{
					const float3 R = normalize( D * nnt - N * (ddn * nnt + sqrtf( cos2t )) );
					const float a = nt - 1, b = nt + 1, R0 = (a * a) / (b * b);
					const float c = 1 + ddn, Re = R0 + (1 - R0) * c * c * c * c * c;
					const float P = .25f + .5f * Re;
					const bool pick = genrand() < P;
					throughput *= pick ? (Re / P) : ((1 - Re) / (1 - P));
					D = pick ? reflect( D, N ) : R;
					
					if (mat.Absorbance && !backfaced) lastabsorbance = (make_float3(1, 1, 1) - matcol) * mat.Absorbance;
				}
			}
			else D = reflect( D, N ); // specular bounce
		}
		else
		{
			// handle diffuse materials
			const float3 wo = D * -1.0f;
			// sample direct lighting using next event estimation and MIS (FLAWLESS)
			float3 L, LN, LColor;
			const float r8 = genrand();
			float area;
			RandomPointOnLight( L, LN, LColor, r8, genrand, area );
			L -= O;
			float sqdist = dot( L, L ), ldist = sqrtf( sqdist );
			L *= 1.0f / ldist;
			const float NdotL = dot( N, L ), LNdotL = -dot( LN, L );
			if ((NdotL > 0) && (LNdotL > 0))
			{
				bool backface; int sprim; float2 SBC; ldist *= 0.99f;
				intersect<true,false>( O + L * EPSILON, L, ldist, SBC, sprim, backface ); rays++;
				if (sprim == -1) 
				{
					const float lightPdf = (LNdotL > EPSILON) ? (sqdist / (LNdotL * area * context.lightcount)) : 0.0f;
					float bsdfPdf;
					bsdfPdf = LambertPdf( L, N );
					if ((lightPdf > 0) && (bsdfPdf > 0)) 
					{
						const float3 f = matcol * INVPI * 0.5f;
						const float weight = PowerHeuristic( lightPdf, bsdfPdf );
						E += throughput * f * LColor * weight * NdotL / lightPdf;
					}
				}
			}
			// bsdf sampling with MIS (FLAWLESS)
			const float r3 = genrand(), r4 = genrand();
			float3 f;
			D = DiffuseReflection( r3, r4, genrand, N ), postPdf = LambertPdf( D, N ), f = matcol * INVPI * 0.5f;
			if (postPdf <= 0.01f) TERMINATE; // hmm
			postThrough = throughput * dot( D, N ) * matcol * INVPI * 0.5f;
			// russian roulette
			if (curdepth > 1)
			{
				float p = max( EPSILON, min( 0.5f, (throughput.x + throughput.y + throughput.z) * 0.333f ) ); // condition taken from pbrt
				if (genrand() > p) TERMINATE;
				throughput /= p;
			}
			postponed = true, postComp = LColor;
			throughput *= f * dot( D, N ) / postPdf;
			firsthit = false;
		}
		curdepth++;
	}
	context.RenderTarget[pidx] = make_float4( E * (1.0f / (float)paths), *(float*)&rays );
}
In all three kernels, Aila and Laine code is used for actual ray / scene intersection. We also use their BVH node layout, as well as Woop's triangle layout. The BVH is stored in texture memory. Triangle data and materials are stored in global memory. We do not use any advcanced CUDA features, and no inter-thread communication, so this is pretty straight-forward code for a GPU and should port to ATI easily.

There is some more stuff, like the handling of light sources, but I discussed that before.
I have a question about the MIS, but I will post that in a separate thread.

- Jacco.

Re: Brigade footage

Posted: Tue Nov 29, 2011 5:17 pm
by sirpalee
Thanks for the snippets, and the details!

I might be able to use some parts from it, to accelerate my own kernels :)

Re: Brigade footage

Posted: Wed Nov 30, 2011 3:39 pm
by spectral
Sounds great,

Please allow me a few questions :-D

1 - Where have you found the 'Novak ideas' ? Is there some information available on the internet ?
2 - Just a suggestion, it will be fine to display the frame-rate with the video :-P
3 - Also, can you provide the model you use (ply, obj...) for testing and compare ?
4 - Don't you use the Timo Aila BVH code anymore ? It looks like you call 'intersect' directly in our PT code ? Maybe you use OptiX now ?

Re: Brigade footage

Posted: Wed Nov 30, 2011 4:01 pm
by ingenious
spectral wrote:1 - Where have you found the 'Novak ideas' ? Is there some information available on the internet ?
I believe he's referring to Jan Novak's path regeneration. Just bing (yes, that is a valid verb ;) ) for it.

Re: Brigade footage

Posted: Thu Dec 01, 2011 1:46 pm
by jbikker
Novak ideas are indeed from Jan Novak's paper on path regeneration.
Frame rate: it's kinda useless; path tracer performance could be expressed in samples per second I guess, but pure fps is not worth much, since I can always double 'performance' by halving the number of samples per pixel (which increases noise). I found that spp, fps and resolution must be carefully balanced: a path tracer looks relatively good at low resolutions, but noise tends to be less objectionable at high resolutions.
Model: no problem, I'll upload it somewhere in a moment.
Aila & Laine: yes I use their code, it's quite possible to call it for a single ray query. :)

EDIT: scene is here, in obj format, with a single texture: http://igad.nhtv.nl/~bikker/files/mistest.zip . Note that we use custom tags in the material file.

Re: Brigade footage

Posted: Thu Dec 01, 2011 2:31 pm
by spectral
Thanks a lot,

Thanks for the Novak idea, there are so much ideas to test and implement that it is fine to see others work :-D

You're right and even, samples per second is not perfect :-P it also depend of the depth (RR based) etc... also with MLT it is difficult to talk about FPS by example or if you have adaptative sampling too !
Maybe the best for performance is to give rays per second... !

I don't know any measure mixing performance and quality !

Yes Aila and Laine can be used without ray-buckets, I have even see that don't using warp-based (32 rays in a batch) processing change nothing to performance ! It was surprising to me :-P

Re: Brigade footage

Posted: Thu Dec 01, 2011 7:23 pm
by jbikker
Some more eye candy. :)



We're still trying to find optimal settings for the light, so it's quite noisy at 16spp. Still, the lighting is awesome in those scenes, imho. ;)

EDIT: I have uploaded the XVid version, since Youtube did a horrible job at converting that video. Here it is: http://igad.nhtv.nl/~bikker/files/sections.avi

Re: Brigade footage

Posted: Thu Dec 01, 2011 9:50 pm
by straaljager
:o Pure awesomeness!!

Really a gorgeous scene, I don't even mind the noise. And it's starting to look like a real game now. :D