Quadrics kinda sorta work for real time

Practical and theoretical implementation discussion.
hobold
Posts: 56
Joined: Wed Dec 21, 2011 6:08 pm

Quadrics kinda sorta work for real time

Post by hobold » Tue Apr 17, 2012 10:55 am

A few of you might remember some ominous claims I made on the old ompf forum about using quadrics as fundamental rendering primitives. I presented a theoretical article about intersecting arbitrary quadrics with arbitrary polyhedra, but didn't hint where this was meant to go, and then I dropped below the radar.

I have spent a lot of time trying to find more practical ways of modeling "arbitrary" shapes as piecewise quadrics. But I could only come up with my own not-quite-solutions. They may make different trade-offs than what exists in the literature, but I can't see them carry anything more than perhaps a flashy demo.

So I recently turned my attention from the impossible to the possible. I only have a proof of concept now, but it seems encouraging to me:
Image
I can fit a kD-Tree to CSG objects made from arbitrary quadrics. The surface primitives are separated well from each other, and CSG trees are automatically simplified down to the surfaces that are relevant in each voxel. The quality of the kD-Tree is not too bad (one can judge this a bit better in animation: http://www.vectorizer.org/exampleHiKdT02.mp4).

I am far from real time, at ~0.75 seconds for one 720p frame on six cores of a 3.3GHz "Westmere" Xeon, but I have only just begun tuning. A few statistics:

- an average of 7.7 rays are fired per pixel (including reflection and up to two shadow rays, plus some basic anti-aliasing)
- a ray traverses 34 branch voxels and 4 leaf voxels on average
- an average leaf voxel contains 1.8 quadrics and 0.2 CSG intersections (i.e. boundaries between two quadrics)
- so 8 quadrics are being tested per ray (which seems a bit high to me; I should probably try to cut off empty space from leaves)

Overall, a single ray still takes a ridiculously long 25000 clock cycles. I do not yet have a good feeling for where all this time is lost. But I am quite confident that my early implementation is not compute bound. I suspect a lot of time is wasted due to memory latency and branch mispredictions.


Anyway, those of you who already have well-tuned code for traversal of kD-Trees might be in a better position to assess the potential of this simplest type of curved surface, so I'll hand out the most interesting pieces of code. I'd be more than happy if someone else wanted to experiment with it. Ask away!

Code: Select all

/*
  Intersecting an Arbitrary Quadric with an Axis Aligned Box
  (an illustrative code fragment)

  An explanation of the geometry and math behind this algorithm can be found at
  http://www.vectorizer.org/IntersectQuadricTetrahedron.html

  Feel free to send comments, improvements and questions to
  "hobold@vectorizer.org". To my knowledge this algorithm is new, and
  unencumbered by intellectual property rights.

  This code fragment is Copyright 2012 by Holger Bettag, licensed under GPLv2.

  If the GPLv2 is too restrictive for you, contact me. I just want to learn
  about improvements you might find. I am sure we can agree on something that
  won't cost you a dime, and doesn't force all the obligations of the GPLv2 on
  you. The GPLv2 is merely the default.
*/

class Quadric {
public:
  double A, B, C;  // A x^2 + B y^2 + C z^2
  double D, E, F;  // + D yz + E zx + F xy
  double G, H, I;  // + G x + H y + I z
  double J;        // + J
}


// handle apex of a parabola
// internal use
static
inline void refineMinMax1D(double& minval, double& maxval,
			   double a, double b, double c,
			   double tmin, double tmax) {
  if(a != 0.0) {
    double pt = -b/(2.0*a);
    // clamp to limits
    pt = qmin(pt, tmax);
    pt = qmax(pt, tmin);
    
    double value = (a*pt + b)*pt + c;
    minval = qmin(minval, value);
    maxval = qmax(maxval, value);
  }
  return;
}

// handle apex of a 2D quadric and the two edges u = umin, u = umax and
// the one corner at u = umax, v = vmin
// internal use
static
void refineMinMax2D(double& minval, double& maxval,
		    double a, double b, double c, double d, double e, double f,
		    double umin, double umax, double vmin, double vmax) {

  // check 2D apex point
  double denominator = 4.0*a*b - c*c;
  double value;
  if (denominator != 0.0) {
    double pu = (c*e - 2.0*b*d)/denominator;
    double pv = (c*d - 2.0*a*e)/denominator;
    // clamp to limits
    pu = qmin(pu, umax);
    pu = qmax(pu, umin);
    pv = qmin(pv, vmax);
    pv = qmax(pv, vmin);

    value = (a*pu + c*pv + d)*pu + (b*pv + e)*pv + f;
    minval = qmin(minval, value);
    maxval = qmax(maxval, value);
  }

  // handle border at u = umin
  refineMinMax1D(minval, maxval,
                 b, c*umin + e, (a*umin + d)*umin + f,
		 vmin, vmax);
  // ... and border at u = umax
  // extract coefficients of parabola (for reuse)
  c = c*umax + e;
  a = (a*umax + d)*umax + f;
  refineMinMax1D(minval, maxval, b, c, a, vmin, vmax);
  
  // look at corner at u = umax, v = vmin
  value = (b*vmin + c)*vmin + a;
  minval = qmin(minval, value);
  maxval = qmax(maxval, value);
  
  return;
}

// intersect a quadric with an axis aligned box
//
// returns 2 (VoxInside) if the box is contained in the quadric,
//         1 (VoxOutside) if the box is completely outside the quadric
//         0 (VoxSurface) if the box overlaps the quadric surface
VoxClass Quadric::intersectVoxel(const Vec3& Min, const Vec3& Max) const {

  // handle the coordinate minimum corner and the coordinate maximum corner
  double value = evaluatePoint(Min);
  double minval = evaluatePoint(Max);
  double maxval = qmax(minval, value);
  minval = qmin(minval, value);

  // process the faces and edges of the box
  // (each of the six face calls handles two of the 12 edges and
  // one of the 8 corners; the remaining two corners are already done)
  
  // extract face at z = zmin as 2 dimensional quadric and process it
  refineMinMax2D(minval, maxval,
    A, B, F, E*Min.z + G, D*Min.z + H, (C*Min.z + I)*Min.z + J,
    Min.x, Max.x, Min.y, Max.y);
  
  // z = zmax
  refineMinMax2D(minval, maxval,
    A, B, F, E*Max.z + G, D*Max.z + H, (C*Max.z + I)*Max.z + J,
    Min.x, Max.x, Min.y, Max.y);
  
  // y = ymin (switched u and v to q(v,ymin,u) for symmetry)
  refineMinMax2D(minval, maxval,
    C, A, E, D*Min.y + I, F*Min.y + G, (B*Min.y + H)*Min.y + J,
    Min.z, Max.z, Min.x, Max.x);
 
  // y = ymax
  refineMinMax2D(minval, maxval,
    C, A, E, D*Max.y + I, F*Max.y + G, (B*Max.y + H)*Max.y + J,
    Min.z, Max.z, Min.x, Max.x);
 
  // x = xmin
  refineMinMax2D(minval, maxval,
    B, C, D, F*Min.x + H, E*Min.x + I, (A*Min.x + G)*Min.x + J,
    Min.y, Max.y, Min.z, Max.z);
  
  // x = xmax
  refineMinMax2D(minval, maxval,
    B, C, D, F*Max.x + H, E*Max.x + I, (A*Max.x + G)*Max.x + J,
    Min.y, Max.y, Min.z, Max.z);
  
  // find the 3D apex point
  double denominator = 8.0*A*B*C + 2.0*(D*E*F - A*D*D - B*E*E - C*F*F);
  if (denominator != 0.0) {
    double px = (2.0*B*E*I + D*D*G + 2.0*C*F*H - 4.0*B*C*G - D*F*I - D*E*H);
    double py = (E*E*H + 2.0*A*D*I + 2.0*C*F*G - 4.0*A*C*H - D*E*G - E*F*I);
    double pz = (2.0*B*E*G + 2.0*A*D*H + F*F*I - 4.0*A*B*I - E*F*H - D*F*G);
    px /= denominator;
    py /= denominator;
    pz /= denominator;
    
    // clamp position to limits
    px = qmin(px, Max.x);
    px = qmax(px, Min.x);
    py = qmin(py, Max.y);
    py = qmax(py, Min.y);
    pz = qmin(pz, Max.z);
    pz = qmax(pz, Min.z);

    value = evaluatePoint(Vec3(px, py, pz));
    minval = qmin(minval, value);
    maxval = qmax(maxval, value);
  }
  
  return VoxClass(((maxval < 0.0) << 1) + (minval > 0.0));  
}

// brute force method, for debugging & test reference
VoxClass Quadric::intersectVoxelDBG(const Vec3& Min, const Vec3& Max) const {
  const int size = 100;
  int x,y,z;
  double min = evaluatePoint(Min);
  double max = min;
  double val;
  double fx, fy, fz;

  for (z = size; z >= 0; z--) {
    fz = Min.z + double(z)*(Max.z - Min.z)/double(size);
    for (y = size; y >= 0; y--) {
      fy = Min.y + double(y)*(Max.y - Min.y)/double(size);
      for (x = size; x >= 0; x--) {
	fx = Min.x + double(x)*(Max.x - Min.x)/double(size);
	
	val = evaluatePoint(Vec3(fx,fy,fz));
	min = qmin(min, val);
	max = qmax(max, val);
      }
    }
  }

  if (min > 0.0)
    return voxOutside;  // positive minimum value => voxel is outside
  
  if (max < 0.0)
    return voxInside;   // negative maximum value => voxel is inside
  
  return voxSurface;    // otherwise voxel must straddle quadric surface
}

Geri
Posts: 146
Joined: Fri Mar 02, 2012 7:01 pm

Re: Quadrics kinda sorta work for real time

Post by Geri » Tue Apr 17, 2012 12:57 pm

i had a quadrathic acceleration structure before, but sphere based is 2x faster. so i just commented out the quadratic tests. (but i can place back it easilly, i just should remove the //-s. ) it was faster when i used the both tests (first the sphere collision, becouse its faster, and if that successed, the quad test also. but this is also slower than the pure spheric structure, even if that mistakely makes a few extra jump)

also, your intersectVoxel function is too slow, due to it uses FDIV... you should eliminate it if you want usable performance.

px /= denominator;
py /= denominator;
pz /= denominator;

->

float 1perdenom=1.0f/denominator;
px*=1perdenom;
py*=1perdenom;
pz*=1perdenom;

but the best would if you would have no division in it at all. also you have too mutch comparison in it.
that code can be coded from 20 c++ line with just 7-8 if and 2-3 division, you should also implement some early jump techniques.
Csontos kezünkbe a nyomor
Ezer év rúnáit véste

hobold
Posts: 56
Joined: Wed Dec 21, 2011 6:08 pm

Re: Quadrics kinda sorta work for real time

Post by hobold » Tue Apr 17, 2012 4:33 pm

Geri wrote: your intersectVoxel function is too slow, due to it uses FDIV... you should eliminate it if you want usable performance.
Yes, divides are long latency and not pipelined. So far I pretend that the construction of the kD-Tree is not time critical, and I did not give it much attention. In this early proof of concept, it takes ~5 seconds on a single core for only 96 quadric surfaces of the test scene (runtime is currently worse than O(N^2) with a huge constant factor, but that is not a fundamental limit). The spatial subdivision structure is then reused for all frames of the camera flight.

I am still struggling to catch up with almost 20 years of improvements on the side of tracing rays through the voxel tree. Please have mercy. :)

Geri
Posts: 146
Joined: Fri Mar 02, 2012 7:01 pm

Re: Quadrics kinda sorta work for real time

Post by Geri » Tue Apr 17, 2012 8:03 pm

catch up with almost 20 years of improvements
dont worry. you just can ignore it, and optimise until its done. you will be suprised after you finish, the code will be almost identical with the existing ones (so this means that the industry arent really in ray tracing yet :P)
Csontos kezünkbe a nyomor
Ezer év rúnáit véste

hobold
Posts: 56
Joined: Wed Dec 21, 2011 6:08 pm

Re: Quadrics kinda sorta work for real time

Post by hobold » Thu Apr 19, 2012 10:25 am

I miscalculated my initial figure for clock cycles per ray ... apparently I divided the number of rays by the number of threads once again. After a few more tweaks for better load balancing and faster code paths for a few special cases, I am now getting between 1.55M and 1.75M rays per second per core. Not great, but respectable. This corresponds to 1900 to 2200 clock cycles per ray. The numbers vary with output resolution. At 1280 pixels wide, the rays are a bit more coherent and are computed faster; at 640 pixels wide things slow down a bit. (No ray packets, no specialized shadow ray optimizations, no SIMD utilization ... everything is plain and simple with a bit of multithreading.)

So the good news is that I am now a lot less confused about the program's high sensitivity to minor changes. But the bad news is that I'll have a much harder time squeezing out cycles than I would have had if a ray really took 25000 clocks.

Finally, I noticed that I handed you only half the magic you need to make your own experiments. The part about adaptively simplifying CSG composites has a few pitfalls. Pruned CSG subtrees must be created on the fly, and passed back to the voxel intersection routine of the parent. Here's another fragment from my proof of concept code (please kindly ignore the ugly memory leak):

Code: Select all

/*
  Intersecting CSG composites with an Axis Aligned Box and simplifying them
  (an illustrative code fragment)

  Feel free to send comments, improvements and questions to
  "hobold@vectorizer.org".

  This code fragment is Copyright 2012 by Holger Bettag, licensed under GPLv2.

  If the GPLv2 is too restrictive for you, contact me. I just want to learn
  about improvements you might find. I am sure we can agree on something that
  won't cost you a dime, and doesn't force all the obligations of the GPLv2 on
  you. The GPLv2 is merely the default.
*/

VoxClass Intersection::intersectVoxel(const GeObject*& Obj,
				      const Vec3& Min, const Vec3& Max) const {
  VoxClass status1, status2;

  const GeObject* obj1;
  const GeObject* obj2;
  
  status1 = child1->intersectVoxel(obj1, Min, Max);
  status2 = child2->intersectVoxel(obj2, Min, Max);

  Obj = this;  // default return object is this CSG intersection
  
  // outside of one means outside of intersection
  if ((status1 == voxOutside) || (status2 == voxOutside)) {
    return voxOutside;
  }

  // both surfaces means we need to test against CSG combination
  if ((status1 == voxSurface) && (status2 == voxSurface)) {
    // there might be an opportunity for pruning here, in case a child ...
    // ... could simplify itself
    if ((child1 != obj1) || (child2 != obj2)) {
      // TODO: fix memory leak here ...
      Obj = new Intersection((GeObject*)obj1, (GeObject*)obj2);
      if (Obj == NULL) {
	fprintf(stderr,
		"Intersection::intersectVoxel(): no memory for pruned CSG.\n");
	exit(0);
      }
    }
    return voxSurface;
  }

  // surface of only one means we can prune CSG tree
  if (status1 == voxSurface) {
    Obj = obj1;
    return voxSurface;
  }
  if (status2 == voxSurface) {
    Obj = obj2;
    return voxSurface;
  }

  // both inside means inside of intersection
  return voxInside;
}


VoxClass Union::intersectVoxel(const GeObject*& Obj,
			       const Vec3& Min, const Vec3& Max) const {
  VoxClass status1, status2;

  const GeObject* obj1;
  const GeObject* obj2;
  
  status1 = child1->intersectVoxel(obj1, Min, Max);
  status2 = child2->intersectVoxel(obj2, Min, Max);

  Obj = this;  // default return object is this CSG union
  
  // inside of one means inside union
  if ((status1 == voxInside) || (status2 == voxInside))
    return voxInside;

  // both surfaces means we need to test against CSG combination
  if ((status1 == voxSurface) && (status2 == voxSurface)) {
    // there might be an opportunity for pruning here, in case a child ...
    // ... could simplify itself
    if ((child1 != obj1) || (child2 != obj2)) {
      // TODO: memory leak here ...
      Obj = new Union((GeObject*)obj1, (GeObject*)obj2);
      if (Obj == NULL) {
	fprintf(stderr,
		"Union::intersectVoxel(): no memory for pruned CSG.\n");
	exit(0);
      }
    }
    return voxSurface;
  }

  // surface of only one means we can prune CSG tree
  if (status1 == voxSurface) {
    Obj = obj1;
    return voxSurface;
  }
  if (status2 == voxSurface) {
    Obj = obj2;
    return voxSurface;
  }

  // outside of both means outside of union
  return voxOutside;
}

hobold
Posts: 56
Joined: Wed Dec 21, 2011 6:08 pm

Re: Quadrics kinda sorta work for real time

Post by hobold » Wed Jul 25, 2012 2:00 pm

I am making slow progress on the impractically slow tree building. I finally understand the geometry and math behind a sweep type algorithm based on the original box-quadric-intersection test. To find all relevant candidate split positions (at most four for a general quadric primitive), it is sufficient to examine a scaffolding of four edges (parallel to the axis in question), plus up to five oblique "struts" (one on each parallel face, and possibly one inside the box). The edges and the struts are all straight lines, so ultimately it boils down to tracing a few carefully chosen rays.

It's not particularly deep or involved, but tedious with many details where devils can hide. I will eventually present a reference implementation, but don't hold your breath ...

On another note, the unsolved modeling problem keeps haunting me. What use is a renderer for objects that no one can conveniently model? In an attempt to motivate myself, I experimented a bit with surfaces of revolution:

Image

In this particular example, circa 280 quadric surfaces are stitched together to approximate an arbitrary, smooth, generating function to +/- 0.0005 tolerance. A similar number of invisible separator quadrics have been used, but I think with some more effort their number could be drastically reduced.


I will continue to sporadically post in this thread, but primarily I will document my (lack of/slow) progress on my website:
http://vectorizer.org/boar/
Comments and criticism are welcome, either here or by email.

graphicsMan
Posts: 164
Joined: Mon Nov 28, 2011 7:28 pm

Re: Quadrics kinda sorta work for real time

Post by graphicsMan » Wed Jul 25, 2012 4:40 pm

Seems like a fun project :)

hobold
Posts: 56
Joined: Wed Dec 21, 2011 6:08 pm

Re: Quadrics kinda sorta work for real time

Post by hobold » Thu Jul 26, 2012 5:32 pm

graphicsMan wrote:Seems like a fun project :)
In a way, the fun is the only remaining reason behind the project (the hopefully humorous tone of the project home page is a consequence of that). I had much grander plans ... imagine if realtime raytracing was capable of not only doing more realistic lighting with less hassle than rasterizers, but could efficiently display curved surfaces. That would be a very visible advantage over rasterization. Maybe even enough of an advantage to make game design studios want to employ this new aesthetic for a game "like none you have seen before". CPUs and GPUs keep converging further as we speak, so the hardware is pretty much already there if you want to do software defined realtime graphics.

However, these plans depend much more critically on the ability to easily model such curved objects than they depend on an efficient renderer (both are needed, but the renderer "only" takes hard work, not so much ingenuity, thanks to all the prior art that you folks out there have already done). I had hoped to come up with a construction that is somewhat compatible with the established tools and design workflow. But I didn't. I spent a year badly reinventing old wheels, or exploring old and new dead ends.

Without some sort of quadric retrofitting voodoo (abbreviated "qRv"; guess how that should be pronounced :-)) controlled by polygon meshes, even a perfect realtime raytracer for quadrics is utterly useless. Still, there are many interesting problems to be solved when you replace straight geometry with conics.

Interesting, and somewhat challenging, problems was exactly what I needed to break free from an employer that had changed beyond recognition. They used to be a creative chaos much like Valve (http://blogs.valvesoftware.com/abrash/v ... m-doing-2/), but became a victim of their success. In the end, it was a bureaucratic nightmare; I was depressed because I felt I couldn't make myself useful anymore.

At the same time I couldn't just jump ship. On the one hand I still felt loyalty to my coworkers, and on the other hand I pretty much knew that I would never ever find another group of bright minds, another place ablaze with ideas like this company used to be in their early days, no matter where I went. It just wasn't as easy as going out the door and heading for "greener pastures".

So I decided to grant myself a sort of sabbatical. To see if I could still solve nontrivial problems. The "grander plans" mentioned above were a distant hope, a justification, a motivation, an excuse for the foolish decision to go into voluntary unemployment in the middle of a financial crisis.

That was probably way more information than anybody wanted. Then again, maybe I get lucky and one of you finds the "grand plans" as fascinating as I do. I'd gladly fill a research position trying to make them as real as I possibly can. *cough* :)

graphicsMan
Posts: 164
Joined: Mon Nov 28, 2011 7:28 pm

Re: Quadrics kinda sorta work for real time

Post by graphicsMan » Thu Jul 26, 2012 7:04 pm

As you mention, modeling with quadrics is not really an easy task. An interesting possibility would be to make it easy to use triangles, but make them smooth. Shinji Ogaki published a paper last year about how to ray trace Phong tessellations. This is really cool, but unfortunately, the surfaces are not G^2 continuous, so you still have to employ shading normals to avoid artifacts. Maybe you could think about extending that in some fashion by imposing extra constraints?

Also, ray tracing of tensor product spline surfaces and Sub-Ds have been investigated, but maybe there's some room for improvement? I understand your desire to use quadrics, since you can use closed-form solutions instead of tessellation or iteration... perhaps you could figure out an approach to fit quadric pieces to a surface in a way that gets C^1 (or better, G^2) continuity across shared faces of a higher-order surface? You will get smoother surfaces than triangles without super-high tessellation, but lower continuity than the higher-order surface. This might be good enough?

Sometime in the past, I fit cubic b-spline tensor product surfaces to Catmull-Clark Sub-Ds, and 99.9% of the time, there was no visual difference across the surface, but computation was much faster/cheaper. Maybe a similar idea could work?

hobold
Posts: 56
Joined: Wed Dec 21, 2011 6:08 pm

Re: Quadrics kinda sorta work for real time

Post by hobold » Fri Jul 27, 2012 1:25 pm

graphicsMan wrote:Shinji Ogaki published a paper last year about how to ray trace Phong tessellations.
I had skimmed it. As far as I understand, he basically renders Bezier triangles of degree two. This works out to an implicit degree four surface (a quartic). There is also a good paper by James Blinn et al on how to render implicit surfaces up to that degree on the GPU, with heavy emphasis on efficient and robust numerics. Both are very worth reading, but the fact remains that cubic and quartic surfaces require substantially more computational work (an order of magnitude or more), and their mathematical complexity makes it very challenging to find clever optimizations. For starters, try to find an exact and efficient "surface in voxel" test for them ...
I understand your desire to use quadrics, since you can use closed-form solutions instead of tessellation or iteration...
There are a number of reasons why I am so convinced that it has to be either quadrics or nothing:
- quadrics are mathematically simple. No innocent brain cells need to be sacrificed for working out the formulas.
- quadrics are numerically simple. The worst operations needed are divide and square root; modern hardware does these with very good throughput and accuracy (unlike the old days when iterative algorithms stalled the pipeline for dozens of cycles). One can hope that a quadric roughly takes only double the memory and double the computation of a triangle.
- quadrics are geometrically just about sufficient to locally approximate any smooth surface
perhaps you could figure out an approach to fit quadric pieces to a surface in a way that gets C^1 (or better, G^2) continuity across shared faces of a higher-order surface? You will get smoother surfaces than triangles without super-high tessellation, but lower continuity than the higher-order surface. This might be good enough?
Actually, the approaches I tried didn't even enforce C^1. The nice thing about second order approximation is that the error at a tangent discontinuity shrinks quadratically as you tesselate more finely. The edges between patches become invisible long before patches shrink to pixel size. (For example, the surface of revolution above does not enforce tangent continuity between patches, but the resulting artifacts are too small to be noticed.)

This was my secret sauce, so to speak. The reason for my initial hubris in thinking I could outdo all those who already painstakingly worked out intricate and useless C^1 constructions. In fact, enforcing C^1 causes undesirable side effects: non-locality and overshoot.

The problems I ran into were even more basic than that. When you are hell-bent on requiring no more than one quadric patch per input triangle, then it will sometimes be impossible to even only get straight C^0 continuity ... triangle meshes can be surprisingly complex, topologically speaking (and quadrics can be frustratingly constrained, geometrically speaking). One can still try constructions that subdivide the input mesh in some way, but right now I don't know enough about processing of meshes to make guesses how that ought to be done.

Post Reply