Archive for 'Case Studies'

Load Balancing Multiple OpenCL™ Devices

OpenCL™ is often criticized for not being “performance portable”.  While it is true that absolute optimal performance will only be achieved when tailoring host code and kernels directly to the target platform, significant improvements can be made by just tweaking some of the knobs that the OpenCL framework already contains.  The point is that OpenCL is a programming model that was specifically conceived to target different types of hardware in a portable fashion.  This discussion by Continue Reading →


Matrix Multiplication using OpenCL Images

We consider matrix multiplication the “Hello World” example of multi-core computing. However, as opposed to the traditional “Hello World”, matrix multiplication is actually useful as it is a basic building blocks for many algorithms.

In this article we will look at how a matrix multiplication can be accelerated on the GPU by using an OpenCL™ implementation. While matrix multiplication is shown as a use case, the purpose is not to show yet another clever implementation of this algorithm. Instead we want ...

Continue Reading →