Weird Thought on World Saving

SamJBarney · 06-20-2014, 12:46 PM

Ok, so here is a thought that just came right out of the blue for me. I don't know if it's viable or not; let me know what you think:

The thing that takes up the most space on the hard drive for minecraft is the world files. They contain all of the chunk data. However, it is a proven fact that we can generate the same terrain over and over again without getting a single block out of place, so storing data we can reproduce is kind of redundant other than to save on processor time.

What if instead of saving the entire chunk we only stored block changes? When a chunk was needed, we could generate it and then apply the changes necessary to transform it into what it was when the chunk was saved last. And we could still store data about entities and such no problem. This could allow for extremely large worlds with small data files.

Chunk Generation could be farmed out to multiple threads to decrease loads times.

Let me know what y'all think.

bearbin · 06-20-2014, 04:24 PM

That would be really cool, but probably a bit buggy. I'd still go with it as an option, as it could work and would really reduce the disk usage. (It shouldn't be default because it makes it incompatible with vanilla.

xoft · 06-20-2014, 04:43 PM

This would make the server very CPU-heavy, and it already is so.
The storage is not only for the block types, but also for the lighting. I've made some profiling just a few days ago, and found that generating uses about 20% CPU and lighting uses about 60 % CPU. By removing the storage for these, you'll be making the server generate and light chunks an order of magnitude more times.

If we went with this scheme, it would mean that for each chunk save, we'd need to generate the original chunk data, and compare with current content. Save the differences. To load, we'd need to generate the original chunk data, load the differences and apply them. This is 2 generations more just for making a slight saving in the savefile size.

This gets even worse when needing to light the chunk, because for lighting, you need all the 3x3 neighbors' blocktypes. So you need to load 9 chunks (including their generating) and then light one chunk. Sure, when loading neighboring chunks the load gets distributed among the chunks so that there's an almost 1:1 ratio of generating / lighting, but the regular loading patterns say that the loading isn't done in neighbors that much.

Also, consider that there's already compression at work while saving. You might not believe it, but the compression actually does help *a lot*. And there's no telling if the differences would be compressible as good as the original data, I have a feeling it won't.

worktycho · 06-20-2014, 09:17 PM

For the CPU load, I've got an algorithm for doing GPU lighting which could seriously reduce CPU load for lighting. As for generating the differences this is easy on a GPU.

So for systems without GPU support its not worth it but it might be worth truing as an option on builds with GPU offloading.

worktycho · 06-21-2014, 02:00 AM

One possablty if we can separate the generator would be to implement this as a separate tool that could be used to compress chunks that aren't used. Then its only a matter of generating on load.

tigerw · 06-21-2014, 03:24 AM

Save only chunks that someone changed. Unsaved ones will regenerate. For lighting, assign everything a light of 15. If the client doesn't calculate it itself, nobody will really notice anyway. If they do, they'll just attribute it to their brightness.

SamJBarney · 06-21-2014, 03:37 AM

Technically, you don't need to regen the chunk to compare differences. You could just keep a running tally of block changes. For instance, in the chunk when a block is changed we flag it with a bool to say it was changed, and then on save we just save the ones that have the change flag. Or we just store the index of the changed block somewhere and just grab the block info when we save.

As for compression, I don't know how the current compression works, but how much further can you compress the block index plus the block type? Because that would be all you would need to store.

worktycho · 06-21-2014, 04:30 AM

Bool per block is 8k ram per chunk.

SamJBarney · 06-21-2014, 05:31 AM

Or 4k if we used a bitfield.