Sven Woop thesis triangles

Practical and theoretical implementation discussion.
Post Reply
Vilem Otte
Posts: 24
Joined: Sun Dec 25, 2011 12:42 am

Sven Woop thesis triangles

Post by Vilem Otte » Sun Apr 22, 2012 1:47 am

Hi,

could someone who already read and implemented triangles in this thesis just quickly walk through my code - http://www.pasteall.org/31120/cpp - and point me to right direction? :roll: I'm running out of ideas what am I actually doing wrong (and the result is wrong indeed). Thank you.

EDIT: So far it seems that my code is good, but somehow I think that I can't pass triangles in whatever order I like (are there some rules for this?)? - No. They can be in any order.

EDIT2: Could this also be affected by imprecision of using SSE2? - No. Transformation phase is OK. So I have bug in collision testing.

svenwoop
Posts: 2
Joined: Sun Apr 22, 2012 6:50 am

Re: Sven Woop thesis triangles

Post by svenwoop » Sun Apr 22, 2012 6:58 am

Just looked over your code, and it looks all right.

Some things I would verify is:
1) Is your matrix inversion function working? Multiply the original matrix with the inverted to see of you get the identity.
2) Verify that m.m1 is giving you the first row of the inverted matrix.
3) Sounds like you are debugging with multiple triangles. Best put in a ray and a single triangle of which you already know the intersection result.

What imprecision are you exactly talking about?

Vilem Otte
Posts: 24
Joined: Sun Dec 25, 2011 12:42 am

Re: Sven Woop thesis triangles

Post by Vilem Otte » Sun Apr 22, 2012 7:40 pm

Wow, actually I didn't expect that the author of thesis will reply to me. Thank you.

So, you're actually right, my code works (at least in debug mode - I run it through gdb). I've also discovered that when I turn "release" mode and build the exe file, my matrix inversion code is somehow actually not working (which seems strange to me) - so I have to find out what are compiler optimizations exactly doing wrong (note that my whole matrix library is written in SSE2 intrinsics).

EDIT: Okay, I don't know actually why (as I don't work with Visual Studio so often (but debugging in it is way more comfortable than in GDB), and I'll see when I'll compile this with GCC, whether the issue is presented there). But somehow setting floatin-point math to Fast instead of Precise screws my matrix inversion code (whole code in intrinsics!!!) - the only thing I fear is, that this will hurt performance a lot (although as I've said, in the end I'll compile it with GCC - so I'll see whether this problem is still present in GCC or not).

EDIT2: GCC doesn't have any problem like this (even with mfpmath=fast flag). So it seems to be only MSVC situation - maybe I'll try also Clang whether there are issues or not (it's getting better and more popular these days). :). Anyway - thank you very much for help - you made a good work on that ray-tri test (works really fast!), and btw. welcome to the forums (I see you joined just yesterday).

Geri
Posts: 146
Joined: Fri Mar 02, 2012 7:01 pm

Re: Sven Woop thesis triangles

Post by Geri » Sun Apr 22, 2012 10:46 pm

-compilers like to fuck the object-oriented code when you try to create something speed-critic. you should printf your variables from line to line. also, if it works if you dump all of them, just start to remove the printfs from step by step ^^
-maybe as your hints, the compiler makes SSE code automaticly from your vector4-s, and fucks them somehow?
-also, inlude your standard c/c++ matrix implementation instead of your sse code
-dont forget to add volatile and alignation to your inline assembly blocks/variables.

also, try this too:

if(v >= 0.0f && u + v <= 1.0f)
to
if((v >= 0.0f) && ((u + v) <= 1.0f))

if(u >= 0.0f && u <= 1.0f)
to
if((u >= 0.0f) && (u <= 1.0f))
Csontos kezünkbe a nyomor
Ezer év rúnáit véste

Vilem Otte
Posts: 24
Joined: Sun Dec 25, 2011 12:42 am

Re: Sven Woop thesis triangles

Post by Vilem Otte » Mon Apr 23, 2012 12:28 pm

My code for matrix inversion looks like this:

Code: Select all

		friend inline mat4 inverse(const mat4& m)
		{
			__m128 f1 = _mm_sub_ps(_mm_mul_ps(_mm_shuffle_ps(m.m3, m.m2, 0xAA),									
											  _mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0xFF), _mm_shuffle_ps(m.m4, m.m3, 0xFF), 0x80)),						 
								   _mm_mul_ps(_mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0xAA), _mm_shuffle_ps(m.m4, m.m3, 0xAA), 0x80),											 
											  _mm_shuffle_ps(m.m3, m.m2, 0xFF)));			
			
			__m128 f2 = _mm_sub_ps(_mm_mul_ps(_mm_shuffle_ps(m.m3, m.m2, 0x55),									
											  _mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0xFF), _mm_shuffle_ps(m.m4, m.m3, 0xFF), 0x80)),						
								   _mm_mul_ps(_mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0x55), _mm_shuffle_ps(m.m4, m.m3, 0x55), 0x80),											 
											  _mm_shuffle_ps(m.m3, m.m2, 0xFF)));			
			
			__m128 f3 = _mm_sub_ps(_mm_mul_ps(_mm_shuffle_ps(m.m3, m.m2, 0x55),									
											  _mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0xAA), _mm_shuffle_ps(m.m4, m.m3, 0xAA), 0x80)),						
								   _mm_mul_ps(_mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0x55), _mm_shuffle_ps(m.m4, m.m3, 0x55), 0x80),									
											  _mm_shuffle_ps(m.m3, m.m2, 0xAA)));			
			
			__m128 f4 = _mm_sub_ps(_mm_mul_ps(_mm_shuffle_ps(m.m3, m.m2, 0x00),							
											  _mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0xFF), _mm_shuffle_ps(m.m4, m.m3, 0xFF), 0x80)),				
								   _mm_mul_ps(_mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0x00), _mm_shuffle_ps(m.m4, m.m3, 0x00), 0x80),			
											  _mm_shuffle_ps(m.m3, m.m2, 0xFF)));			
			
			__m128 f5 = _mm_sub_ps(_mm_mul_ps(_mm_shuffle_ps(m.m3, m.m2, 0x00),		
											  _mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0xAA), _mm_shuffle_ps(m.m4, m.m3, 0xAA), 0x80)),					
								   _mm_mul_ps(_mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0x00), _mm_shuffle_ps(m.m4, m.m3, 0x00), 0x80),		
											  _mm_shuffle_ps(m.m3, m.m2, 0xAA)));			
			
			__m128 f6 = _mm_sub_ps(_mm_mul_ps(_mm_shuffle_ps(m.m3, m.m2, 0x00),		
											  _mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0x55), _mm_shuffle_ps(m.m4, m.m3, 0x55), 0x80)),				
								   _mm_mul_ps(_mm_shuffle_ps(_mm_shuffle_ps(m.m4, m.m3, 0x00), _mm_shuffle_ps(m.m4, m.m3, 0x00), 0x80),	
											  _mm_shuffle_ps(m.m3, m.m2, 0x55)));

			__m128 v1 = _mm_shuffle_ps(_mm_shuffle_ps(m.m2, m.m1, 0x00), _mm_shuffle_ps(m.m2, m.m1, 0x00), 0xA8);			
			__m128 v2 = _mm_shuffle_ps(_mm_shuffle_ps(m.m2, m.m1, 0x55), _mm_shuffle_ps(m.m2, m.m1, 0x55), 0xA8);			
			__m128 v3 = _mm_shuffle_ps(_mm_shuffle_ps(m.m2, m.m1, 0xAA), _mm_shuffle_ps(m.m2, m.m1, 0xAA), 0xA8);			
			__m128 v4 = _mm_shuffle_ps(_mm_shuffle_ps(m.m2, m.m1, 0xFF), _mm_shuffle_ps(m.m2, m.m1, 0xFF), 0xA8);			
			__m128 s1 = _mm_set_ps(-0.0f,  0.0f, -0.0f,  0.0f);			
			__m128 s2 = _mm_set_ps( 0.0f, -0.0f,  0.0f, -0.0f);	
			__m128 i1 = _mm_xor_ps(s1, _mm_add_ps(_mm_sub_ps(_mm_mul_ps(v2, f1),					
															 _mm_mul_ps(v3, f2)),							
												  _mm_mul_ps(v4, f3)));
			__m128 i2 = _mm_xor_ps(s2, _mm_add_ps(_mm_sub_ps(_mm_mul_ps(v1, f1),		
															 _mm_mul_ps(v3, f4)),											
												  _mm_mul_ps(v4, f5)));			
			__m128 i3 = _mm_xor_ps(s1, _mm_add_ps(_mm_sub_ps(_mm_mul_ps(v1, f2),					
															 _mm_mul_ps(v2, f4)),								
												  _mm_mul_ps(v4, f6)));			
			__m128 i4 = _mm_xor_ps(s2, _mm_add_ps(_mm_sub_ps(_mm_mul_ps(v1, f3),				
															 _mm_mul_ps(v2, f5)),						
												  _mm_mul_ps(v3, f6)));
			__m128 d = _mm_mul_ps(m.m1, _mm_movelh_ps(_mm_unpacklo_ps(i1, i2), _mm_unpacklo_ps(i3, i4)));			
			d = _mm_add_ps(d, _mm_shuffle_ps(d, d, 0x4E));	
			d = _mm_add_ps(d, _mm_shuffle_ps(d, d, 0x11));	
			d = _mm_div_ps(_mm_set1_ps(1.0f), d);	
			return mat4(float4(_mm_mul_ps(i1, d)),	
						float4(_mm_mul_ps(i2, d)),				
						float4(_mm_mul_ps(i3, d)),				
						float4(_mm_mul_ps(i4, d)));
		}
And VS actually drops out some instructions when using Fast floating-point math. :shock: Strange thing is, that GCC in MinGW doesn't do this, so...

Geri
Posts: 146
Joined: Fri Mar 02, 2012 7:01 pm

Re: Sven Woop thesis triangles

Post by Geri » Mon Apr 23, 2012 3:45 pm

Csontos kezünkbe a nyomor
Ezer év rúnáit véste

Post Reply