Silly me, I've been comparing performance in Debug mode again. Of course it needs to be done in Release.
So here's a Release mode comparison of the same algorithms:
So the speedup is "only" 5.5x.
However, after two more rounds of code optimizing, I was able to squeeze out some more:
Oh yeah!
(Just changed datatype from double to float)
I think I've hit the sweet spot now:
Changed datatype back to doubles, but re-implemented the CubicInterpolate() method for doubles, thus eliminating the conversion back and forth between doubles and floats.
So here's a Release mode comparison of the same algorithms:
Code:
[21:40:01] cCubicNoise generating 500 * 256x256 values took 670 ticks (0.67 sec)
[21:40:05] cNoise generating 500 * 256x256 values took 3678 ticks (3.68 sec)
However, after two more rounds of code optimizing, I was able to squeeze out some more:
Code:
[21:52:49] cCubicNoise generating 1000 * 256x256 values took 1138 ticks (1.14 sec)
[21:52:56] cNoise generating 1000 * 256x256 values took 7403 ticks (7.40 sec)
[21:52:56] New method is 6.51x faster
Oh yeah!
Code:
[22:31:47] cCubicNoise generating 1000 * 256x256 values took 701 ticks (0.70 sec)
[22:31:54] cNoise generating 1000 * 256x256 values took 7404 ticks (7.40 sec)
[22:31:54] New method is 10.56x faster
I think I've hit the sweet spot now:
Code:
[22:38:19] cCubicNoise generating 1000 * 256x256 values took 686 ticks (0.69 sec)
[22:38:27] cNoise generating 1000 * 256x256 values took 7403 ticks (7.40 sec)
[22:38:27] New method is 10.79x faster