08-09-2012, 01:57 AM
Just for fun I ran a profiled version of MCServer on the RasPi. Quite unexpectedly, The function eating up the most time was the lighting thread's PrepareSkyLight() function with 17 % of the run time. Who would've thought?
Here's the list of top time-eaters. Notice how the generators are actually doing rather well.
I guess the ARM architecture is good for math programming, but not so good for memory access. PrepareSkyLight jumps all over more than a 600 KiB block of memory, reading it and writing to it byte-wise. Generators usually take a few numeric values and chew them over and over until they get a final number out of it, and that's where a many-register platform wins.
I'd like to run a few more tests on this - to see if there's any kind of RAM cache somewhere in there, if it's better to access the memory consecutively or it is the same as accessing it randomly.
Here's the list of top time-eaters. Notice how the generators are actually doing rather well.
Code:
% cumulative self self total
time seconds seconds calls s/call s/call name
17.84 10.22 10.22 398 0.03 0.03 cLightingThread::PrepareSkyLight
12.50 17.38 7.16 882 0.01 0.02 deflate_slow
10.77 23.55 6.17 1217115 0.00 0.00 cCaveTunnel::ProcessChunk
10.10 29.34 5.79 15968 0.00 0.00 fill_window
8.38 34.14 4.80 1345982 0.00 0.00 longest_match
8.25 38.87 4.73 422 0.01 0.04 cLightingThread::LightChunk
5.85 42.22 3.35 370359 0.00 0.00 cHeiGenBiomal::GetHeightAt
5.69 45.48 3.26 8901 0.00 0.00 cLightingThread::CalcLightStep
4.08 47.82 2.34 484 0.00 0.00 cChunk::CreateBlockEntities
2.34 49.16 1.34 478 0.00 0.02 cStructGenTrees::GenStructures
2.09 50.36 1.20 1775634 0.00 0.00 cNoise::CubicNoise2D
1.75 51.36 1.00 6123 0.00 0.00 adler32
1.66 52.31 0.95 726731 0.00 0.00 cBioGenVoronoi::VoronoiBiome
1.43 53.13 0.82 426000 0.00 0.00 cNoise::CubicNoise3D
1.20 53.82 0.69 1814 0.00 0.00 cIniFile::FindValue
0.98 54.38 0.56 34071 0.00 0.00 cStructGenRavines::cRavine::ProcessChunk
0.59 54.72 0.34 484 0.00 0.01 cStructGenWormNestCaves::GenStructures
0.54 55.03 0.31 866 0.00 0.00 compress_block
I'd like to run a few more tests on this - to see if there's any kind of RAM cache somewhere in there, if it's better to access the memory consecutively or it is the same as accessing it randomly.