I'm not a QMC guru in any way, but after working hard for quite a while I actually did manage to implement a sobol sampler that does improve convergence by quite a lot.
My implementation might not be most academically correct one, but it works, and I've put quite a lot of hours into measuring convergence vs other samplers, and it beats MT, Halton and Fauré (although they're pretty close). And all this with not too objectionable correlation patterns during rendering, and no hashing or scrambling whatsoever, just the pure ouput of the sobol sample generator.
The 1-thread scenario is simple (oh, and I don't generate my samples ahead, I just draw them as long as the path continues (I'm using the Joe & Kuo data so I can go up to dimension 21201)
1. Move to next sobol sample index
2. (sample the image plane) dimension 0 & dimension 1 for generating the pixel index
(basically: pixelX = width * dim; pixelY = height * dim;)
3. dimension 2 & dimension 3 for jittering the sample within the pixel
4. dimension 4 & dimension 5 for lens sampling (I haven't implemented motion blur so I don't sample time atm)
5. (direct light sampling) dimension 6 for choosing a light source, dimension 7 & 8 for sampling the chosen light source
6. dimension 9 & 10 for direction and dimension 11 for choosing a bsdf component to sample
7. dimension 12 for RR
8. go to #5 and repeat until max_path_length (incrementing the dimension all the time, obviously) then go to #1
As I said, the 1-thread scenario is simple as you just jump along the sequence incrementing the sobol sample index for every new pixel sample. My biggest problem was doing it across many threads. I tried scrambling (using a unique scramble value for every thread), C-P rotation and a global/shared index counter (for #1 in my description above). I found that scrambling reduced convergence somewhat and also produced objectionable correlation patterns, C-P rotation gave less correlation patterns but they were still objectionable. Using a global (shared by all threads) sobol index counter gave me the nicest result and also the best convergence. I know this is a lousy way to do it, but speed was not my objective. Gruenschloss published a paper about exactly this, http://gruenschloss.org/parqmc/parqmc.pdf
, but to be honest the math (section 3.3) is a bit too dense for me so It'll be a while before I'll be able to implement it.