OpenCL

OpenCLâ„¢ (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming of heterogeneous systems. OpenCL provides a uniform programming environment for software developers to write efficient, portable code for client computer systems, high-performance computing servers, and handheld devices using a diverse mix of multi-core CPUs and other parallel processors

OpenCL is functionally portable. i.e. the OpenCL code will work on both INTEL and NVIDIA and other OpenCL platforms. However,
the performance is not guaranteed. Usually developers develop OpenCL code optimized for a particular platform.

(Ref: http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk/)

Gowonda gpu nodes

Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.0 CUDA 4.0.1, SDK Revision = 7027912,
NumDevs = 2, Device = Tesla C2070, Device = Tesla C2070

n020
n021
n022
n023

Please refer to the following documentation as well:

http://confluence.rcs.griffith.edu.au:8080/display/GHPC/cuda

nvidia implementation

http://developer.nvidia.com/opencl

http://confluence.rcs.griffith.edu.au:8080/download/attachments/29098154/NVIDIA_OpenCL_BestPracticesGuide.pdf

OpenCL v1.1 support is included in publicly available NVIDIA drivers version 280.13 . However we do not have this version of the driver currently on gowonda gpu nodes. We have openCL 1.0 on gowonda.

GPU computing SDK (version 1.1…) comes with 34 OpenCL samples including GL and D3D interop exemples and even a multi-GPU demo.

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  270.40  Sat Mar 26 13:00:34 PDT 2011
GCC version:  gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC)

http://confluence.rcs.griffith.edu.au:8080/display/GHPC/cuda#cuda-CUDAenabledDeviceDriver

Sample compilation

GPU computing SDK (version 1.1…) comes with 34 OpenCL samples including GL and D3D interop exemples and even a multi-GPU demo

module load cuda
cd /usr/local/cuda/NVIDIA_GPU_Computing_SDK/OpenCL

(Examine the Makefile here)
make 2>&1 |tee make.output.txt

cd /usr/local/cuda/NVIDIA_GPU_Computing_SDK/OpenCL/bin/linux/release
./oclDeviceQuery

[oclDeviceQuery] starting...
./oclDeviceQuery Starting...

OpenCL SW Info:

 CL_PLATFORM_NAME:      NVIDIA CUDA
 CL_PLATFORM_VERSION:   OpenCL 1.0 CUDA 4.0.1
 OpenCL SDK Revision:   7027912


OpenCL Device Info:

 2 devices found supporting OpenCL:

 ---------------------------------
 Device Tesla C2070
 ---------------------------------
  CL_DEVICE_NAME:                       Tesla C2070
  CL_DEVICE_VENDOR:                     NVIDIA Corporation
  CL_DRIVER_VERSION:                    270.40
  CL_DEVICE_VERSION:                    OpenCL 1.0 CUDA
  CL_DEVICE_TYPE:                       CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS:          14
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:   3
  CL_DEVICE_MAX_WORK_ITEM_SIZES:        1024 / 1024 / 64
  CL_DEVICE_MAX_WORK_GROUP_SIZE:        1024
  CL_DEVICE_MAX_CLOCK_FREQUENCY:        1147 MHz
  CL_DEVICE_ADDRESS_BITS:               32
  CL_DEVICE_MAX_MEM_ALLOC_SIZE:         1343 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE:            5375 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT:   yes
  CL_DEVICE_LOCAL_MEM_TYPE:             local
  CL_DEVICE_LOCAL_MEM_SIZE:             48 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:   64 KByte
  CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
  CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT:              1
  CL_DEVICE_MAX_READ_IMAGE_ARGS:        128
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:       8
  CL_DEVICE_SINGLE_FP_CONFIG:           denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma

  CL_DEVICE_IMAGE <dim>                 2D_MAX_WIDTH     4096
                                        2D_MAX_HEIGHT    32768
                                        3D_MAX_WIDTH     2048
                                        3D_MAX_HEIGHT    2048
                                        3D_MAX_DEPTH     2048

  CL_DEVICE_EXTENSIONS:                 cl_khr_byte_addressable_store
                                        cl_khr_icd
                                        cl_khr_gl_sharing
                                        cl_nv_compiler_options
                                        cl_nv_device_attribute_query
                                        cl_nv_pragma_unroll
                                        cl_khr_global_int32_base_atomics
                                        cl_khr_global_int32_extended_atomics
                                        cl_khr_local_int32_base_atomics
                                        cl_khr_local_int32_extended_atomics
                                        cl_khr_fp64


  CL_DEVICE_COMPUTE_CAPABILITY_NV:      2.0
  NUMBER OF MULTIPROCESSORS:            14
  NUMBER OF CUDA CORES:                 448
  CL_DEVICE_REGISTERS_PER_BLOCK_NV:     32768
  CL_DEVICE_WARP_SIZE_NV:               32
  CL_DEVICE_GPU_OVERLAP_NV:             CL_TRUE
  CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:     CL_FALSE
  CL_DEVICE_INTEGRATED_MEMORY_NV:       CL_FALSE
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>  CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1


 ---------------------------------
 Device Tesla C2070
 ---------------------------------
  CL_DEVICE_NAME:                       Tesla C2070
  CL_DEVICE_VENDOR:                     NVIDIA Corporation
  CL_DRIVER_VERSION:                    270.40
  CL_DEVICE_VERSION:                    OpenCL 1.0 CUDA
  CL_DEVICE_TYPE:                       CL_DEVICE_TYPE_GPU
  CL_DEVICE_MAX_COMPUTE_UNITS:          14
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:   3
  CL_DEVICE_MAX_WORK_ITEM_SIZES:        1024 / 1024 / 64
  CL_DEVICE_MAX_WORK_GROUP_SIZE:        1024
  CL_DEVICE_MAX_CLOCK_FREQUENCY:        1147 MHz
  CL_DEVICE_ADDRESS_BITS:               32
  CL_DEVICE_MAX_MEM_ALLOC_SIZE:         1343 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE:            5375 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT:   yes
  CL_DEVICE_LOCAL_MEM_TYPE:             local
  CL_DEVICE_LOCAL_MEM_SIZE:             48 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:   64 KByte
  CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
  CL_DEVICE_QUEUE_PROPERTIES:           CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT:              1
  CL_DEVICE_MAX_READ_IMAGE_ARGS:        128
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS:       8
  CL_DEVICE_SINGLE_FP_CONFIG:           denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma

  CL_DEVICE_IMAGE <dim>                 2D_MAX_WIDTH     4096
                                        2D_MAX_HEIGHT    32768
                                        3D_MAX_WIDTH     2048
                                        3D_MAX_HEIGHT    2048
                                        3D_MAX_DEPTH     2048

  CL_DEVICE_EXTENSIONS:                 cl_khr_byte_addressable_store
                                        cl_khr_icd
                                        cl_khr_gl_sharing
                                        cl_nv_compiler_options
                                        cl_nv_device_attribute_query
                                        cl_nv_pragma_unroll
                                        cl_khr_global_int32_base_atomics
                                        cl_khr_global_int32_extended_atomics
                                        cl_khr_local_int32_base_atomics
                                        cl_khr_local_int32_extended_atomics
                                        cl_khr_fp64


  CL_DEVICE_COMPUTE_CAPABILITY_NV:      2.0
  NUMBER OF MULTIPROCESSORS:            14
  NUMBER OF CUDA CORES:                 448
  CL_DEVICE_REGISTERS_PER_BLOCK_NV:     32768
  CL_DEVICE_WARP_SIZE_NV:               32
  CL_DEVICE_GPU_OVERLAP_NV:             CL_TRUE
  CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:     CL_FALSE
  CL_DEVICE_INTEGRATED_MEMORY_NV:       CL_FALSE
  CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>  CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1


  ---------------------------------
  2D Image Formats Supported (71)
  ---------------------------------
  #     Channel Order   Channel Type

  1     CL_R            CL_FLOAT
  2     CL_R            CL_HALF_FLOAT
  3     CL_R            CL_UNORM_INT8
  4     CL_R            CL_UNORM_INT16
  5     CL_R            CL_SNORM_INT16
  6     CL_R            CL_SIGNED_INT8
  7     CL_R            CL_SIGNED_INT16
  8     CL_R            CL_SIGNED_INT32
  9     CL_R            CL_UNSIGNED_INT8
  10    CL_R            CL_UNSIGNED_INT16
  11    CL_R            CL_UNSIGNED_INT32
  12    CL_A            CL_FLOAT
  13    CL_A            CL_HALF_FLOAT
  14    CL_A            CL_UNORM_INT8
  15    CL_A            CL_UNORM_INT16
  16    CL_A            CL_SNORM_INT16
  17    CL_A            CL_SIGNED_INT8
  18    CL_A            CL_SIGNED_INT16
  19    CL_A            CL_SIGNED_INT32
  20    CL_A            CL_UNSIGNED_INT8
  21    CL_A            CL_UNSIGNED_INT16
  22    CL_A            CL_UNSIGNED_INT32
  23    CL_RG           CL_FLOAT
  24    CL_RG           CL_HALF_FLOAT
  25    CL_RG           CL_UNORM_INT8
  26    CL_RG           CL_UNORM_INT16
  27    CL_RG           CL_SNORM_INT16
  28    CL_RG           CL_SIGNED_INT8
  29    CL_RG           CL_SIGNED_INT16
  30    CL_RG           CL_SIGNED_INT32
  31    CL_RG           CL_UNSIGNED_INT8
  32    CL_RG           CL_UNSIGNED_INT16
  33    CL_RG           CL_UNSIGNED_INT32
  34    CL_RA           CL_FLOAT
  35    CL_RA           CL_HALF_FLOAT
  36    CL_RA           CL_UNORM_INT8
  37    CL_RA           CL_UNORM_INT16
  38    CL_RA           CL_SNORM_INT16
  39    CL_RA           CL_SIGNED_INT8
  40    CL_RA           CL_SIGNED_INT16
  41    CL_RA           CL_SIGNED_INT32
  42    CL_RA           CL_UNSIGNED_INT8
  43    CL_RA           CL_UNSIGNED_INT16
  44    CL_RA           CL_UNSIGNED_INT32
  45    CL_RGBA         CL_FLOAT
  46    CL_RGBA         CL_HALF_FLOAT
  47    CL_RGBA         CL_UNORM_INT8
  48    CL_RGBA         CL_UNORM_INT16
  49    CL_RGBA         CL_SNORM_INT16
  50    CL_RGBA         CL_SIGNED_INT8
  51    CL_RGBA         CL_SIGNED_INT16
  52    CL_RGBA         CL_SIGNED_INT32
  53    CL_RGBA         CL_UNSIGNED_INT8
  54    CL_RGBA         CL_UNSIGNED_INT16
  55    CL_RGBA         CL_UNSIGNED_INT32
  56    CL_BGRA         CL_UNORM_INT8
  57    CL_BGRA         CL_SIGNED_INT8
  58    CL_BGRA         CL_UNSIGNED_INT8
  59    CL_ARGB         CL_UNORM_INT8
  60    CL_ARGB         CL_SIGNED_INT8
  61    CL_ARGB         CL_UNSIGNED_INT8
  62    CL_INTENSITY    CL_FLOAT
  63    CL_INTENSITY    CL_HALF_FLOAT
  64    CL_INTENSITY    CL_UNORM_INT8
  65    CL_INTENSITY    CL_UNORM_INT16
  66    CL_INTENSITY    CL_SNORM_INT16
  67    CL_LUMINANCE    CL_FLOAT
  68    CL_LUMINANCE    CL_HALF_FLOAT
  69    CL_LUMINANCE    CL_UNORM_INT8
  70    CL_LUMINANCE    CL_UNORM_INT16
  71    CL_LUMINANCE    CL_SNORM_INT16

  ---------------------------------
  3D Image Formats Supported (71)
  ---------------------------------
  #     Channel Order   Channel Type

  1     CL_R            CL_FLOAT
  2     CL_R            CL_HALF_FLOAT
  3     CL_R            CL_UNORM_INT8
  4     CL_R            CL_UNORM_INT16
  5     CL_R            CL_SNORM_INT16
  6     CL_R            CL_SIGNED_INT8
  7     CL_R            CL_SIGNED_INT16
  8     CL_R            CL_SIGNED_INT32
  9     CL_R            CL_UNSIGNED_INT8
  10    CL_R            CL_UNSIGNED_INT16
  11    CL_R            CL_UNSIGNED_INT32
  12    CL_A            CL_FLOAT
  13    CL_A            CL_HALF_FLOAT
  14    CL_A            CL_UNORM_INT8
  15    CL_A            CL_UNORM_INT16
  16    CL_A            CL_SNORM_INT16
  17    CL_A            CL_SIGNED_INT8
  18    CL_A            CL_SIGNED_INT16
  19    CL_A            CL_SIGNED_INT32
  20    CL_A            CL_UNSIGNED_INT8
  21    CL_A            CL_UNSIGNED_INT16
  22    CL_A            CL_UNSIGNED_INT32
  23    CL_RG           CL_FLOAT
  24    CL_RG           CL_HALF_FLOAT
  25    CL_RG           CL_UNORM_INT8
  26    CL_RG           CL_UNORM_INT16
  27    CL_RG           CL_SNORM_INT16
  28    CL_RG           CL_SIGNED_INT8
  29    CL_RG           CL_SIGNED_INT16
  30    CL_RG           CL_SIGNED_INT32
  31    CL_RG           CL_UNSIGNED_INT8
  32    CL_RG           CL_UNSIGNED_INT16
  33    CL_RG           CL_UNSIGNED_INT32
  34    CL_RA           CL_FLOAT
  35    CL_RA           CL_HALF_FLOAT
  36    CL_RA           CL_UNORM_INT8
  37    CL_RA           CL_UNORM_INT16
  38    CL_RA           CL_SNORM_INT16
  39    CL_RA           CL_SIGNED_INT8
  40    CL_RA           CL_SIGNED_INT16
  41    CL_RA           CL_SIGNED_INT32
  42    CL_RA           CL_UNSIGNED_INT8
  43    CL_RA           CL_UNSIGNED_INT16
  44    CL_RA           CL_UNSIGNED_INT32
  45    CL_RGBA         CL_FLOAT
  46    CL_RGBA         CL_HALF_FLOAT
  47    CL_RGBA         CL_UNORM_INT8
  48    CL_RGBA         CL_UNORM_INT16
  49    CL_RGBA         CL_SNORM_INT16
  50    CL_RGBA         CL_SIGNED_INT8
  51    CL_RGBA         CL_SIGNED_INT16
  52    CL_RGBA         CL_SIGNED_INT32
  53    CL_RGBA         CL_UNSIGNED_INT8
  54    CL_RGBA         CL_UNSIGNED_INT16
  55    CL_RGBA         CL_UNSIGNED_INT32
  56    CL_BGRA         CL_UNORM_INT8
  57    CL_BGRA         CL_SIGNED_INT8
  58    CL_BGRA         CL_UNSIGNED_INT8
  59    CL_ARGB         CL_UNORM_INT8
  60    CL_ARGB         CL_SIGNED_INT8
  61    CL_ARGB         CL_UNSIGNED_INT8
  62    CL_INTENSITY    CL_FLOAT
  63    CL_INTENSITY    CL_HALF_FLOAT
  64    CL_INTENSITY    CL_UNORM_INT8
  65    CL_INTENSITY    CL_UNORM_INT16
  66    CL_INTENSITY    CL_SNORM_INT16
  67    CL_LUMINANCE    CL_FLOAT
  68    CL_LUMINANCE    CL_HALF_FLOAT
  69    CL_LUMINANCE    CL_UNORM_INT8
  70    CL_LUMINANCE    CL_UNORM_INT16
  71    CL_LUMINANCE    CL_SNORM_INT16

oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.0 CUDA 4.0.1, SDK Revision = 7027912, NumDevs = 2, Device = Tesla C2070, Device = Tesla C2070

System Info:

 Local Time/Date =  08:33:31, 11/17/2011
 CPU Name: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
 # of CPU processors: 24
 Linux version 2.6.32-131.0.15.el6.x86_64 (mockbuild@x86-007.build.bos.redhat.com) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #1 SMP Tue May 10 15:42:40 EDT 2011


[oclDeviceQuery] test results...
PASSED



Installation Directory:

/usr/local/cuda/include/CL/opencl.h
/usr/local/cuda/NVIDIA_GPU_Computing_SDK/OpenCL/common/inc/CL/opencl.h

(build the necessary Cuda SDK libraries by running makefile: /usr/local/cuda/NVIDIA_GPU_Computing_SDK/OpenCL)

Sample PBS script

cat pbs.01

#!/bin/bash
#PBS -m abe
#PBS -M YOUREMAIL@griffith.edu.au
#PBS -N openCL_GPU_job
#PBS -l ngpus=1
#PBS -l walltime=100:00:00
#PBS -q gpu
module load cuda/4.0
echo "Hello from $HOSTNAME: date = `date`"
cd /export/home/s123456/pbs/opencl/LennardJones
./LennardJones pop256dd.xyz
echo "Finished at `date`"

Usage: qsub pbs.01

Intel Implementation

The intel implementation has been uninstalled (But keeping the installation notes for future use.)

Installation

http://software.intel.com/en-us/articles/installation-notes-opencl-sdk/

Installation on the nodes
========================
rpm -ivh intel_ocl_sdk_1.5_x64.rpm


rpm -ivh intel_ocl_sdk_1.5_x64.rpm
Preparing...                ########################################### [100%]
   1:intel-ocl-sdk          ########################################### [100%]


Installation on the image:
==========================
mount --bind /proc/ /compute/proc/
mount --bind /dev /compute/dev
rpm --root=/compute/ -ivh intel_ocl_sdk_1.5_x64.rpm
umount /compute/dev
umount /compute/proc

Package Listing
===============

rpm -qlp intel_ocl_sdk_1.5_x64.rpm
/etc/OpenCL/vendors/intelocl64.icd
/usr/bin/ioc
/usr/bin/iocgui.sh
/usr/include/CL/cl.h
/usr/include/CL/cl_d3d9.h
/usr/include/CL/cl_ext.h
/usr/include/CL/cl_gl.h
/usr/include/CL/cl_gl_ext.h
/usr/include/CL/cl_platform.h
/usr/include/CL/opencl.h
/usr/lib64/OpenCL/vendors/intel/__ocl_svml_e9.so
/usr/lib64/OpenCL/vendors/intel/__ocl_svml_h8.so
/usr/lib64/OpenCL/vendors/intel/__ocl_svml_u8.so
/usr/lib64/OpenCL/vendors/intel/__ocl_svml_y8.so
/usr/lib64/OpenCL/vendors/intel/clbltfne9.rtl
/usr/lib64/OpenCL/vendors/intel/clbltfnh8.rtl
/usr/lib64/OpenCL/vendors/intel/clbltfnu8.rtl
/usr/lib64/OpenCL/vendors/intel/clbltfny8.rtl
/usr/lib64/OpenCL/vendors/intel/docs/apache_license.txt
/usr/lib64/OpenCL/vendors/intel/docs/boost_license.txt
/usr/lib64/OpenCL/vendors/intel/docs/llvm_release_license.txt
/usr/lib64/OpenCL/vendors/intel/ioc.jar
/usr/lib64/OpenCL/vendors/intel/ioc64
/usr/lib64/OpenCL/vendors/intel/iocgui64.sh
/usr/lib64/OpenCL/vendors/intel/libOclCpuBackEnd.so
/usr/lib64/OpenCL/vendors/intel/libboost_filesystem.so
/usr/lib64/OpenCL/vendors/intel/libboost_filesystem.so.1.46.1
/usr/lib64/OpenCL/vendors/intel/libboost_system.so
/usr/lib64/OpenCL/vendors/intel/libboost_system.so.1.46.1
/usr/lib64/OpenCL/vendors/intel/libcl_logger.so
/usr/lib64/OpenCL/vendors/intel/libclang_compiler.so
/usr/lib64/OpenCL/vendors/intel/libclbltfne9.so
/usr/lib64/OpenCL/vendors/intel/libclbltfnh8.so
/usr/lib64/OpenCL/vendors/intel/libclbltfnu8.so
/usr/lib64/OpenCL/vendors/intel/libclbltfny8.so
/usr/lib64/OpenCL/vendors/intel/libcpu_device.so
/usr/lib64/OpenCL/vendors/intel/libintelocl.so
/usr/lib64/OpenCL/vendors/intel/libtask_executor.so
/usr/lib64/OpenCL/vendors/intel/libtbb.so
/usr/lib64/OpenCL/vendors/intel/libtbb.so.2
/usr/lib64/OpenCL/vendors/intel/libtbbmalloc.so
/usr/lib64/OpenCL/vendors/intel/libtbbmalloc.so.2
/usr/lib64/OpenCL/vendors/intel/libtbbmalloc_proxy.so
/usr/lib64/OpenCL/vendors/intel/libtbbmalloc_proxy.so.2
/usr/lib64/OpenCL/vendors/intel/llc
/usr/lib64/OpenCL/vendors/intel/opencl_.pch
/usr/lib64/OpenCL/vendors/intel/version.txt
/usr/lib64/libOpenCL.so


Usage

Intel® OpenCL SDK 1.5 related binaries are installed to the following directory:
/usr/lib64/OpenCL/vendors/intel.
To work with the OpenCL runtime, an application should link the application to the OpenCL Installable Client Driver (ICD), libOpenCL.so, which is installed to /usr/lib64.

 ioc
No input parameters
usage: ioc64 <ARGUMENT> [OPTIONS]
Intel(R) OpenCL(TM) Offline Compiler Command-Line Client, version 1.0.2
(C) Intel Corporation 2011. All rights reserved

ARGUMENTS:
    -input=<input_file_name>       - Build the OpenCL Code given in <input_file_name>
    -version                       - show version
    -help                          - show list of available commands
OPTIONS:
    -simd=<instruction_set_arch>   - target instruction set architecture
                                     'sse41' for streaming SIMD extension 4.1
                                     'sse42' for streaming SIMD extension 4.2
                                     'avx' for advanced vector extensions
    -output=<output_file_name>     - write the build log to <output_file_name>
    -asm[=<file_name>]             - Generate assembly code
    -llvm[=<file_name>]            - Generate llvm code
    -ir[=<file_name>]              - Generate intermediate binary file
    -bo[="<build_options>"]        - Add build options

Ref:

1. http://www.codeproject.com/KB/GPU-Programming/IntroToOpenCL.aspx
2. http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/
3. http://www.khronos.org/files/opencl-1-2-quick-reference-card.pdf
4. http://www.khronos.org/opencl/resources
5. http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk/
6. http://www.youtube.com/watch?v=-ROYgRg3x8E
7. http://www.khronos.org/opencl/resources
8. http://www.khronos.org/registry/cl/
9. opencl tutorials http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%201