GPU code

bearbin · 06-15-2014, 06:36 AM

I'd go for option 4 also.

worktycho · 06-15-2014, 06:36 AM

I'll code up the veronoi BioGen for option 4.

xoft · 06-15-2014, 06:42 AM

The Voronoi generators will be more difficult than all the others, they aren't exactly suited for value-array calculation.

worktycho · 06-15-2014, 06:55 AM

How about HeightGenClassic. Very simple, looped noise over the entire chunk.

xoft · 06-15-2014, 11:08 PM

Perfect candidate.

worktycho · (This post was last modified: 06-18-2014, 01:06 AM by worktycho.)

Got HeightGenClassic to run on my machine: GPU runtime - 30 microseconds, data transfer - 2 microseconds. Total execution time from en-queuing kernel to finish read back: 303 microseconds. Something tells me that we we want to use this we need to have several chunks in-flight at once.

For comparision: Still using the openCL API CPU runtime - 58 microseconds, total runtime 98 microseconds.

SO significant perfornace increases if we can deal with the latency by doing stuff asyncronsly and batching.

worktycho · 06-18-2014, 01:12 AM

Codes now on the GPUcode branch.

NiLSPACE · 06-18-2014, 01:17 AM

I'm going to try it now Smile

Do I have to use some cmake magic to make it compile properly?

worktycho · (This post was last modified: 06-18-2014, 01:27 AM by worktycho.)

No, but you do need an OpenCL compatible SDK and driver. If you've got and ATI card that means a recent version of catalyst and the AMD APP SDK. You may also need to make sure the AMD OpenCL.dll is in you dll search path. CMake might ask you set the OPENCL_INCLUDE_DIR and OPENCL_LIBRARY_DIR though.

Just got total runtime down to 125 microseconds.

NiLSPACE · 06-18-2014, 01:30 AM

I got an error. It can't find "CL/cl.hpp"