Not sure, Its looking a lot harder than I expected. Writing vectorized code is not to difficult but writing efficient cross-platform vector code is much hard. AT the moment I'm looking a five options:
- Maintain four copy's of every vector function, single, double, triple and quad width for the different vector widths. (given that one of the functions I'm working on is in the ender generator I don't think this is an option)
- Do something horrendous with macros or template-metaprogramming (rewritting c++ syntax bad)
- Write a small function language compiler in lua that can do stream-fusion
- rewrite it in c and use OpenCL.
- Give up
- Maintain four copy's of every vector function, single, double, triple and quad width for the different vector widths. (given that one of the functions I'm working on is in the ender generator I don't think this is an option)
- Do something horrendous with macros or template-metaprogramming (rewritting c++ syntax bad)
- Write a small function language compiler in lua that can do stream-fusion
- rewrite it in c and use OpenCL.
- Give up