GPU code
#31
I'd go for option 4 also.
Reply
Thanks given by:
#32
I'll code up the veronoi BioGen for option 4.
Reply
Thanks given by:
#33
The Voronoi generators will be more difficult than all the others, they aren't exactly suited for value-array calculation.
Reply
Thanks given by:
#34
How about HeightGenClassic. Very simple, looped noise over the entire chunk.
Reply
Thanks given by:
#35
Perfect candidate.
Reply
Thanks given by:
#36
Got HeightGenClassic to run on my machine: GPU runtime - 30 microseconds, data transfer - 2 microseconds. Total execution time from en-queuing kernel to finish read back: 303 microseconds. Something tells me that we we want to use this we need to have several chunks in-flight at once.

For comparision: Still using the openCL API CPU runtime - 58 microseconds, total runtime 98 microseconds.

SO significant perfornace increases if we can deal with the latency by doing stuff asyncronsly and batching.
Reply
Thanks given by:
#37
Codes now on the GPUcode branch.
Reply
Thanks given by:
#38
I'm going to try it now Smile Do I have to use some cmake magic to make it compile properly?
Reply
Thanks given by:
#39
No, but you do need an OpenCL compatible SDK and driver. If you've got and ATI card that means a recent version of catalyst and the AMD APP SDK. You may also need to make sure the AMD OpenCL.dll is in you dll search path. CMake might ask you set the OPENCL_INCLUDE_DIR and OPENCL_LIBRARY_DIR though.

Just got total runtime down to 125 microseconds.
Reply
Thanks given by:
#40
I got an error. It can't find "CL/cl.hpp"
Reply
Thanks given by:




Users browsing this thread: 10 Guest(s)