Brainstorming: Noise optimization - Printable Version +- Cuberite Forum (https://forum.cuberite.org) +-- Forum: Cuberite (https://forum.cuberite.org/forum-4.html) +--- Forum: Development (https://forum.cuberite.org/forum-13.html) +--- Thread: Brainstorming: Noise optimization (/thread-832.html) |
Brainstorming: Noise optimization - xoft - 03-28-2013 As I've already sketched up in FS #337 (http://www.mc-server.org/support/index.php?do=details&task_id=337), the noise generator could be optimized by generating an array of values at once instead of a single value. I'd like to give it a whack. At the same time, I don't think the code in Noise.cpp that is using SSE is any good - it hasn't even been in use for as long as I've been on this project. So how about I get rid of it in favor of the new array-handling noise? As for the arrays, I was thinking about creating several new classes. cCubicNoise would generate the same noise that cNoise now does, but for arrays. Then cPerlinNoise would combine several cCubicNoise-s to produce a Perlin noise. If found useful, a cRidgedMulti class would be written to combine two cPerlinNoise-s to produce a ridged multifractal noise. Each of the noise classes will have functions Generate1D, Generate2D and Generate3D that will take 1D, 2D and 3D arrays of doubles, and coords for the array boundaries in the noise-space. Because the PerlinNoise and RidgedMulti need an extra array for workspace, I'm thinking about having the possibility of providing this workspace array as an optional parameter - so that each call to GenerateND() doesn't result in a memory allocation and freeing - usually the callers will have the ability to cache and reuse these workspaces. So these will be the main interface: Code: class cCubicNoise Anyone any thoughts about this? I'm especially interested in any reasons for keeping the SSE code in. RE: Brainstorming: Noise optimization - bearbin - 03-28-2013 Seems quite good, especially if it will make performance improvements. RE: Brainstorming: Noise optimization - xoft - 03-28-2013 I believe it will make a quite substantial performance improvement. Current noise has to do all these for each point queried: 1, Floor all coords to integral values 2, Calculate underlying noise value at 4x4x4 integral neighbors 3, Cubic-interpolate each layer (4x4x4 -> 4x4), then cubic-interpolate each column (4x4 -> 4), then finally cubic-interpolate the final value (4 -> 1) With the new system, point 1 and 2 will be done only occasionally (rough expectation - about 5 % of all times) and I think even the interpolation could be tweaked somehow to save a few operations. RE: Brainstorming: Noise optimization - ThuGie - 01-15-2014 Hey, Just wondering if this would be something, http://en.wikipedia.org/wiki/Simplex_noise RE: Brainstorming: Noise optimization - xoft - 01-15-2014 There's not much info on the noise generation itself. Anyway, MCS already uses as little noise as possible, speeding it up won't matter too much. RE: Brainstorming: Noise optimization - worktycho - 01-16-2014 Only thing about noise is its something that is extremely suited to vectorization. However it isn't vectorized automatically because of the noise generation spanning several functions and combining several operations is less common than loop vectorization. I tried some experiments at using the clang and gcc vector extensions but they did not seem to generate sse instructions (other than scalar floating point). It might be worthwhile to use macros to rewrite the code to use sse/avx or neon if available if youre looking at paralleling but that would reduce readability. For example sse2 which is in all x64 machines can preform 4 calculations simultaniously so it might be worth thinking about generating vectors rather than arrays. RE: Brainstorming: Noise optimization - FakeTruth - 01-16-2014 I tried using SSE for the noise once. It gave the same results, but it was not faster at all. It required so many functions to get around the "***intrin" functions that it didn't pay off at all. The tiny function to generate a pseudo random number became huge RE: Brainstorming: Noise optimization - worktycho - 01-16-2014 What sort of functions, convertions or handling non-sse platforms? If were generating 4 elements at a time it seems obvious to use a vector add rather than 4 separate adds. Also if were doing parallel generation then we can just keep the existing code but make it work on vectors instead. RE: Brainstorming: Noise optimization - FakeTruth - 01-16-2014 This function float cNoise::IntNoise( int a_X ) { int x = ((a_X*m_Seed)<<13) ^ a_X; return ( 1.0f - ( (x * (x * x * 15731 + 789221) + 1376312589) & 0x7fffffff) / 1073741824.0f); } turned into this monster __m128 SSE_IntNoise( const __m128i & a_X4 ) { __m128i X4 = _mm_xor_si128( _mm_slli_epi32( a_X4, 13 ), a_X4 ); //_mm_sub_ps( _mm_set_ps1( 1.0f ) // 1.f - __m128 result = _mm_sub_ps( _mm_set_ps1( 1.0f ) , _mm_div_ps( // ( ( (x * ((x*x)*15731 + 789221)) + 1376312589 ) & 0x7fffffff ) / 1073741824.0f _mm_cvtepi32_ps( // (float) -> converts to float _mm_and_si128( // ( (x * ((x*x)*15731 + 789221)) + 1376312589 ) & 0x7fffffff _mm_set1_epi32( 0x7fffffff ) // 0x7fffffff , _mm_add_epi32( // (x * ((x*x)*15731 + 789221)) + 1376312589 _mm_set1_epi32( 1376312589 ) // 1376312589 , _mm_mul_epu32( // x * ((x*x)*15731 + 789221) X4 , _mm_add_epi32( // ((x*x)*15731 + 789221) _mm_set1_epi32( 789221 ) // 789221 , _mm_mul_epu32( // ((x*x)*15731) _mm_mul_epu32( X4, X4 ) // x*x , _mm_set1_epi32( 15731 ) // 15731 ) ) ) ) ) ) , _mm_set_ps1( 1073741824.0f ) // 1073741824.0f ) ); return result; } RE: Brainstorming: Noise optimization - xoft - 01-16-2014 I think this was the wrong approach - we were getting a value for single coords; now we have the opportunity to optimize in truly vector fashion - with the cCubicNoise class, each of the Generate() functions operats on an entire array of neighboring noise values. I believe that *could* be optimized with the vector instructions, but I don't have the guts to do it properly. |