@xoft. I've just tested a quick benchmark with clang 3.4 on my i5-2450M. If I remove the repeated value detection from CalcFloorFrac it speeds up generation by 4 - 6 times. It looks like clang can vectorise the loop when its simple.
gcc 4.8 is giving similar improvements.
gcc 4.8 is giving similar improvements.