Cuda only similar to above but not quite the same
OpenCL only
OpenCL only hint that the consumer should vectorise the function with type T
See Source File