Which directorys do i have to link to "OPENCL_INCLUDE_DIR" and "OPENCL_LIBRARY_PATH"?
GPU code
|
06-19-2014, 07:28 AM
OPENCL_INCLUDE_DIR is the location of the openCL headers. It will be somewhere in the cuda toolkit program files and contain a folder CL which will contain various header files, at a minimum cl.hpp. Look for folders named include. OPENCL_LIBRARY path is the location of the openCL.lib file. Again it should be somewhere in the cuda tookit program files. probably in a folder called lib.
06-19-2014, 07:51 AM
Okay, I've added a copy of the cpp wrapper to the repo. You still need to set the parent of that directory as the include directory for the actual APIs though.
Thanks given by: LO1ZB
Testet with ChunkWorx:
618 ch/s without GPGPU 678 ch/s with GPGPU I change the values for QUEUE_WARNING_LIMIT and QUEUE_SKIP_LIMIT to 50000.
06-19-2014, 04:33 PM
worktycho, I have a feeling you're going about this the wrong way.
Most of the generators do their calculations on arrays of values; each has its calculation different, but the basic principle is similar: Get an array of values, fill them with noise, do some basic calculations on all those values, add another noise etc., and finally upscale the array to the chunk's size. I think this is the abstraction that you should be aiming for, rather than implementing each generator in separate GPGPU code, provide the cValueArray2D and cValueArray3D classes (templates, actually, based on the upscaling factor) that implement those calculations in GPGPU (lazily - on evaluation, rather than on calling the operation) and that primitive can be then used for all generators, with only minimal code being duplicated.
Doing that for GPU is relatively easy as its runtime compiled. The problem with doing that is building the CPU side. For the CPU the biggest problem is memory bandwidth so the one thing you don't want to do is generate code like this:
Code: for(int i = 0; i < chunkSize; i++) Code: Array.Map([](int a, int b, int c){a + b * c}, "a + b * c"); Though I agree that duplication is not the way to go about. The problem is that short of implementing a C to C compiler or rewriting the generator in a custom DSL I can't see a portable way of avoiding duplication.
06-20-2014, 12:43 AM
Is there a way for the GPU to leave the results in memory so that another operation can use them?
So that when we have calls like cValueArray<...> Values(...); Values.GeneratePerlin(...) Values.AddConstant(1) Values.Multiply(2) Values.Evaluate();the Evaluate function can use a buffer it uploads to the GPU and then call virtual functions of the operations, such as GeneratePerlin(), that operate on that GPU-side buffer, and finally pull the buffer once all the operations are complete? Yes, we're still accessing the memory in the wrong pattern, which could be improved upon, but we're not losing our modularity.
Again GPU side is easy. Just use a cache object and callback and keeping memory on GPU is also not difficult. GPU wise I could write an api with minimal additional cost. Because you build up an AST for the codegen you can optimise the memory access pattern. If the operations are pure then the optimisation is easy. The problem is when executing on platforms without GPUs like cheap servers. I experimented with this sort of interface a while back and it costs up to 20-30x performance loss on the CPU.
06-20-2014, 01:06 AM
20 - 30x performance loss compared to what exactly? To the GPU version? Not a fair comparison.
|
« Next Oldest | Next Newest »
|
Users browsing this thread: 24 Guest(s)