OpenCL
OpenCLâ„¢ (Open Computing Language) is the first open, royalty-free standard for general-purpose parallel programming of heterogeneous systems. OpenCL provides a uniform programming environment for software developers to write efficient, portable code for client computer systems, high-performance computing servers, and handheld devices using a diverse mix of multi-core CPUs and other parallel processors
OpenCL is functionally portable. i.e. the OpenCL code will work on both INTEL and NVIDIA and other OpenCL platforms. However,
the performance is not guaranteed. Usually developers develop OpenCL code optimized for a particular platform.
(Ref: http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk/)
Gowonda gpu nodes
Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.0 CUDA 4.0.1, SDK Revision = 7027912,
NumDevs = 2, Device = Tesla C2070, Device = Tesla C2070
n020
n021
n022
n023
Please refer to the following documentation as well:
http://confluence.rcs.griffith.edu.au:8080/display/GHPC/cuda
nvidia implementation
http://developer.nvidia.com/opencl
OpenCL v1.1 support is included in publicly available NVIDIA drivers version 280.13 . However we do not have this version of the driver currently on gowonda gpu nodes. We have openCL 1.0 on gowonda.
GPU computing SDK (version 1.1…) comes with 34 OpenCL samples including GL and D3D interop exemples and even a multi-GPU demo.
cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX x86_64 Kernel Module 270.40 Sat Mar 26 13:00:34 PDT 2011 GCC version: gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC)
http://confluence.rcs.griffith.edu.au:8080/display/GHPC/cuda#cuda-CUDAenabledDeviceDriver
Sample compilation
GPU computing SDK (version 1.1…) comes with 34 OpenCL samples including GL and D3D interop exemples and even a multi-GPU demo
module load cuda cd /usr/local/cuda/NVIDIA_GPU_Computing_SDK/OpenCL (Examine the Makefile here) make 2>&1 |tee make.output.txt
cd /usr/local/cuda/NVIDIA_GPU_Computing_SDK/OpenCL/bin/linux/release ./oclDeviceQuery [oclDeviceQuery] starting... ./oclDeviceQuery Starting... OpenCL SW Info: CL_PLATFORM_NAME: NVIDIA CUDA CL_PLATFORM_VERSION: OpenCL 1.0 CUDA 4.0.1 OpenCL SDK Revision: 7027912 OpenCL Device Info: 2 devices found supporting OpenCL: --------------------------------- Device Tesla C2070 --------------------------------- CL_DEVICE_NAME: Tesla C2070 CL_DEVICE_VENDOR: NVIDIA Corporation CL_DRIVER_VERSION: 270.40 CL_DEVICE_VERSION: OpenCL 1.0 CUDA CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU CL_DEVICE_MAX_COMPUTE_UNITS: 14 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 64 CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024 CL_DEVICE_MAX_CLOCK_FREQUENCY: 1147 MHz CL_DEVICE_ADDRESS_BITS: 32 CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1343 MByte CL_DEVICE_GLOBAL_MEM_SIZE: 5375 MByte CL_DEVICE_ERROR_CORRECTION_SUPPORT: yes CL_DEVICE_LOCAL_MEM_TYPE: local CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_IMAGE_SUPPORT: 1 CL_DEVICE_MAX_READ_IMAGE_ARGS: 128 CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8 CL_DEVICE_SINGLE_FP_CONFIG: denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 4096 2D_MAX_HEIGHT 32768 3D_MAX_WIDTH 2048 3D_MAX_HEIGHT 2048 3D_MAX_DEPTH 2048 CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 CL_DEVICE_COMPUTE_CAPABILITY_NV: 2.0 NUMBER OF MULTIPROCESSORS: 14 NUMBER OF CUDA CORES: 448 CL_DEVICE_REGISTERS_PER_BLOCK_NV: 32768 CL_DEVICE_WARP_SIZE_NV: 32 CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_FALSE CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1 --------------------------------- Device Tesla C2070 --------------------------------- CL_DEVICE_NAME: Tesla C2070 CL_DEVICE_VENDOR: NVIDIA Corporation CL_DRIVER_VERSION: 270.40 CL_DEVICE_VERSION: OpenCL 1.0 CUDA CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU CL_DEVICE_MAX_COMPUTE_UNITS: 14 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 64 CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024 CL_DEVICE_MAX_CLOCK_FREQUENCY: 1147 MHz CL_DEVICE_ADDRESS_BITS: 32 CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1343 MByte CL_DEVICE_GLOBAL_MEM_SIZE: 5375 MByte CL_DEVICE_ERROR_CORRECTION_SUPPORT: yes CL_DEVICE_LOCAL_MEM_TYPE: local CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_IMAGE_SUPPORT: 1 CL_DEVICE_MAX_READ_IMAGE_ARGS: 128 CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8 CL_DEVICE_SINGLE_FP_CONFIG: denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 4096 2D_MAX_HEIGHT 32768 3D_MAX_WIDTH 2048 3D_MAX_HEIGHT 2048 3D_MAX_DEPTH 2048 CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 CL_DEVICE_COMPUTE_CAPABILITY_NV: 2.0 NUMBER OF MULTIPROCESSORS: 14 NUMBER OF CUDA CORES: 448 CL_DEVICE_REGISTERS_PER_BLOCK_NV: 32768 CL_DEVICE_WARP_SIZE_NV: 32 CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_FALSE CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1 --------------------------------- 2D Image Formats Supported (71) --------------------------------- # Channel Order Channel Type 1 CL_R CL_FLOAT 2 CL_R CL_HALF_FLOAT 3 CL_R CL_UNORM_INT8 4 CL_R CL_UNORM_INT16 5 CL_R CL_SNORM_INT16 6 CL_R CL_SIGNED_INT8 7 CL_R CL_SIGNED_INT16 8 CL_R CL_SIGNED_INT32 9 CL_R CL_UNSIGNED_INT8 10 CL_R CL_UNSIGNED_INT16 11 CL_R CL_UNSIGNED_INT32 12 CL_A CL_FLOAT 13 CL_A CL_HALF_FLOAT 14 CL_A CL_UNORM_INT8 15 CL_A CL_UNORM_INT16 16 CL_A CL_SNORM_INT16 17 CL_A CL_SIGNED_INT8 18 CL_A CL_SIGNED_INT16 19 CL_A CL_SIGNED_INT32 20 CL_A CL_UNSIGNED_INT8 21 CL_A CL_UNSIGNED_INT16 22 CL_A CL_UNSIGNED_INT32 23 CL_RG CL_FLOAT 24 CL_RG CL_HALF_FLOAT 25 CL_RG CL_UNORM_INT8 26 CL_RG CL_UNORM_INT16 27 CL_RG CL_SNORM_INT16 28 CL_RG CL_SIGNED_INT8 29 CL_RG CL_SIGNED_INT16 30 CL_RG CL_SIGNED_INT32 31 CL_RG CL_UNSIGNED_INT8 32 CL_RG CL_UNSIGNED_INT16 33 CL_RG CL_UNSIGNED_INT32 34 CL_RA CL_FLOAT 35 CL_RA CL_HALF_FLOAT 36 CL_RA CL_UNORM_INT8 37 CL_RA CL_UNORM_INT16 38 CL_RA CL_SNORM_INT16 39 CL_RA CL_SIGNED_INT8 40 CL_RA CL_SIGNED_INT16 41 CL_RA CL_SIGNED_INT32 42 CL_RA CL_UNSIGNED_INT8 43 CL_RA CL_UNSIGNED_INT16 44 CL_RA CL_UNSIGNED_INT32 45 CL_RGBA CL_FLOAT 46 CL_RGBA CL_HALF_FLOAT 47 CL_RGBA CL_UNORM_INT8 48 CL_RGBA CL_UNORM_INT16 49 CL_RGBA CL_SNORM_INT16 50 CL_RGBA CL_SIGNED_INT8 51 CL_RGBA CL_SIGNED_INT16 52 CL_RGBA CL_SIGNED_INT32 53 CL_RGBA CL_UNSIGNED_INT8 54 CL_RGBA CL_UNSIGNED_INT16 55 CL_RGBA CL_UNSIGNED_INT32 56 CL_BGRA CL_UNORM_INT8 57 CL_BGRA CL_SIGNED_INT8 58 CL_BGRA CL_UNSIGNED_INT8 59 CL_ARGB CL_UNORM_INT8 60 CL_ARGB CL_SIGNED_INT8 61 CL_ARGB CL_UNSIGNED_INT8 62 CL_INTENSITY CL_FLOAT 63 CL_INTENSITY CL_HALF_FLOAT 64 CL_INTENSITY CL_UNORM_INT8 65 CL_INTENSITY CL_UNORM_INT16 66 CL_INTENSITY CL_SNORM_INT16 67 CL_LUMINANCE CL_FLOAT 68 CL_LUMINANCE CL_HALF_FLOAT 69 CL_LUMINANCE CL_UNORM_INT8 70 CL_LUMINANCE CL_UNORM_INT16 71 CL_LUMINANCE CL_SNORM_INT16 --------------------------------- 3D Image Formats Supported (71) --------------------------------- # Channel Order Channel Type 1 CL_R CL_FLOAT 2 CL_R CL_HALF_FLOAT 3 CL_R CL_UNORM_INT8 4 CL_R CL_UNORM_INT16 5 CL_R CL_SNORM_INT16 6 CL_R CL_SIGNED_INT8 7 CL_R CL_SIGNED_INT16 8 CL_R CL_SIGNED_INT32 9 CL_R CL_UNSIGNED_INT8 10 CL_R CL_UNSIGNED_INT16 11 CL_R CL_UNSIGNED_INT32 12 CL_A CL_FLOAT 13 CL_A CL_HALF_FLOAT 14 CL_A CL_UNORM_INT8 15 CL_A CL_UNORM_INT16 16 CL_A CL_SNORM_INT16 17 CL_A CL_SIGNED_INT8 18 CL_A CL_SIGNED_INT16 19 CL_A CL_SIGNED_INT32 20 CL_A CL_UNSIGNED_INT8 21 CL_A CL_UNSIGNED_INT16 22 CL_A CL_UNSIGNED_INT32 23 CL_RG CL_FLOAT 24 CL_RG CL_HALF_FLOAT 25 CL_RG CL_UNORM_INT8 26 CL_RG CL_UNORM_INT16 27 CL_RG CL_SNORM_INT16 28 CL_RG CL_SIGNED_INT8 29 CL_RG CL_SIGNED_INT16 30 CL_RG CL_SIGNED_INT32 31 CL_RG CL_UNSIGNED_INT8 32 CL_RG CL_UNSIGNED_INT16 33 CL_RG CL_UNSIGNED_INT32 34 CL_RA CL_FLOAT 35 CL_RA CL_HALF_FLOAT 36 CL_RA CL_UNORM_INT8 37 CL_RA CL_UNORM_INT16 38 CL_RA CL_SNORM_INT16 39 CL_RA CL_SIGNED_INT8 40 CL_RA CL_SIGNED_INT16 41 CL_RA CL_SIGNED_INT32 42 CL_RA CL_UNSIGNED_INT8 43 CL_RA CL_UNSIGNED_INT16 44 CL_RA CL_UNSIGNED_INT32 45 CL_RGBA CL_FLOAT 46 CL_RGBA CL_HALF_FLOAT 47 CL_RGBA CL_UNORM_INT8 48 CL_RGBA CL_UNORM_INT16 49 CL_RGBA CL_SNORM_INT16 50 CL_RGBA CL_SIGNED_INT8 51 CL_RGBA CL_SIGNED_INT16 52 CL_RGBA CL_SIGNED_INT32 53 CL_RGBA CL_UNSIGNED_INT8 54 CL_RGBA CL_UNSIGNED_INT16 55 CL_RGBA CL_UNSIGNED_INT32 56 CL_BGRA CL_UNORM_INT8 57 CL_BGRA CL_SIGNED_INT8 58 CL_BGRA CL_UNSIGNED_INT8 59 CL_ARGB CL_UNORM_INT8 60 CL_ARGB CL_SIGNED_INT8 61 CL_ARGB CL_UNSIGNED_INT8 62 CL_INTENSITY CL_FLOAT 63 CL_INTENSITY CL_HALF_FLOAT 64 CL_INTENSITY CL_UNORM_INT8 65 CL_INTENSITY CL_UNORM_INT16 66 CL_INTENSITY CL_SNORM_INT16 67 CL_LUMINANCE CL_FLOAT 68 CL_LUMINANCE CL_HALF_FLOAT 69 CL_LUMINANCE CL_UNORM_INT8 70 CL_LUMINANCE CL_UNORM_INT16 71 CL_LUMINANCE CL_SNORM_INT16 oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.0 CUDA 4.0.1, SDK Revision = 7027912, NumDevs = 2, Device = Tesla C2070, Device = Tesla C2070 System Info: Local Time/Date = 08:33:31, 11/17/2011 CPU Name: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz # of CPU processors: 24 Linux version 2.6.32-131.0.15.el6.x86_64 (mockbuild@x86-007.build.bos.redhat.com) (gcc version 4.4.4 20100726 (Red Hat 4.4.4-13) (GCC) ) #1 SMP Tue May 10 15:42:40 EDT 2011 [oclDeviceQuery] test results... PASSED
Installation Directory:
/usr/local/cuda/include/CL/opencl.h
/usr/local/cuda/NVIDIA_GPU_Computing_SDK/OpenCL/common/inc/CL/opencl.h
(build the necessary Cuda SDK libraries by running makefile: /usr/local/cuda/NVIDIA_GPU_Computing_SDK/OpenCL)
Sample PBS script
cat pbs.01
#!/bin/bash #PBS -m abe #PBS -M YOUREMAIL@griffith.edu.au #PBS -N openCL_GPU_job #PBS -l ngpus=1 #PBS -l walltime=100:00:00 #PBS -q gpu module load cuda/4.0 echo "Hello from $HOSTNAME: date = `date`" cd /export/home/s123456/pbs/opencl/LennardJones ./LennardJones pop256dd.xyz echo "Finished at `date`"
Usage: qsub pbs.01
Intel Implementation
The intel implementation has been uninstalled (But keeping the installation notes for future use.)
Installation
http://software.intel.com/en-us/articles/installation-notes-opencl-sdk/
Installation on the nodes ======================== rpm -ivh intel_ocl_sdk_1.5_x64.rpm rpm -ivh intel_ocl_sdk_1.5_x64.rpm Preparing... ########################################### [100%] 1:intel-ocl-sdk ########################################### [100%] Installation on the image: ========================== mount --bind /proc/ /compute/proc/ mount --bind /dev /compute/dev rpm --root=/compute/ -ivh intel_ocl_sdk_1.5_x64.rpm umount /compute/dev umount /compute/proc Package Listing =============== rpm -qlp intel_ocl_sdk_1.5_x64.rpm /etc/OpenCL/vendors/intelocl64.icd /usr/bin/ioc /usr/bin/iocgui.sh /usr/include/CL/cl.h /usr/include/CL/cl_d3d9.h /usr/include/CL/cl_ext.h /usr/include/CL/cl_gl.h /usr/include/CL/cl_gl_ext.h /usr/include/CL/cl_platform.h /usr/include/CL/opencl.h /usr/lib64/OpenCL/vendors/intel/__ocl_svml_e9.so /usr/lib64/OpenCL/vendors/intel/__ocl_svml_h8.so /usr/lib64/OpenCL/vendors/intel/__ocl_svml_u8.so /usr/lib64/OpenCL/vendors/intel/__ocl_svml_y8.so /usr/lib64/OpenCL/vendors/intel/clbltfne9.rtl /usr/lib64/OpenCL/vendors/intel/clbltfnh8.rtl /usr/lib64/OpenCL/vendors/intel/clbltfnu8.rtl /usr/lib64/OpenCL/vendors/intel/clbltfny8.rtl /usr/lib64/OpenCL/vendors/intel/docs/apache_license.txt /usr/lib64/OpenCL/vendors/intel/docs/boost_license.txt /usr/lib64/OpenCL/vendors/intel/docs/llvm_release_license.txt /usr/lib64/OpenCL/vendors/intel/ioc.jar /usr/lib64/OpenCL/vendors/intel/ioc64 /usr/lib64/OpenCL/vendors/intel/iocgui64.sh /usr/lib64/OpenCL/vendors/intel/libOclCpuBackEnd.so /usr/lib64/OpenCL/vendors/intel/libboost_filesystem.so /usr/lib64/OpenCL/vendors/intel/libboost_filesystem.so.1.46.1 /usr/lib64/OpenCL/vendors/intel/libboost_system.so /usr/lib64/OpenCL/vendors/intel/libboost_system.so.1.46.1 /usr/lib64/OpenCL/vendors/intel/libcl_logger.so /usr/lib64/OpenCL/vendors/intel/libclang_compiler.so /usr/lib64/OpenCL/vendors/intel/libclbltfne9.so /usr/lib64/OpenCL/vendors/intel/libclbltfnh8.so /usr/lib64/OpenCL/vendors/intel/libclbltfnu8.so /usr/lib64/OpenCL/vendors/intel/libclbltfny8.so /usr/lib64/OpenCL/vendors/intel/libcpu_device.so /usr/lib64/OpenCL/vendors/intel/libintelocl.so /usr/lib64/OpenCL/vendors/intel/libtask_executor.so /usr/lib64/OpenCL/vendors/intel/libtbb.so /usr/lib64/OpenCL/vendors/intel/libtbb.so.2 /usr/lib64/OpenCL/vendors/intel/libtbbmalloc.so /usr/lib64/OpenCL/vendors/intel/libtbbmalloc.so.2 /usr/lib64/OpenCL/vendors/intel/libtbbmalloc_proxy.so /usr/lib64/OpenCL/vendors/intel/libtbbmalloc_proxy.so.2 /usr/lib64/OpenCL/vendors/intel/llc /usr/lib64/OpenCL/vendors/intel/opencl_.pch /usr/lib64/OpenCL/vendors/intel/version.txt /usr/lib64/libOpenCL.so
Usage
Intel® OpenCL SDK 1.5 related binaries are installed to the following directory:
/usr/lib64/OpenCL/vendors/intel.
To work with the OpenCL runtime, an application should link the application to the OpenCL Installable Client Driver (ICD), libOpenCL.so, which is installed to /usr/lib64.
ioc No input parameters usage: ioc64 <ARGUMENT> [OPTIONS] Intel(R) OpenCL(TM) Offline Compiler Command-Line Client, version 1.0.2 (C) Intel Corporation 2011. All rights reserved ARGUMENTS: -input=<input_file_name> - Build the OpenCL Code given in <input_file_name> -version - show version -help - show list of available commands OPTIONS: -simd=<instruction_set_arch> - target instruction set architecture 'sse41' for streaming SIMD extension 4.1 'sse42' for streaming SIMD extension 4.2 'avx' for advanced vector extensions -output=<output_file_name> - write the build log to <output_file_name> -asm[=<file_name>] - Generate assembly code -llvm[=<file_name>] - Generate llvm code -ir[=<file_name>] - Generate intermediate binary file -bo[="<build_options>"] - Add build options
Ref:
1. http://www.codeproject.com/KB/GPU-Programming/IntroToOpenCL.aspx
2. http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/
3. http://www.khronos.org/files/opencl-1-2-quick-reference-card.pdf
4. http://www.khronos.org/opencl/resources
5. http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk/
6. http://www.youtube.com/watch?v=-ROYgRg3x8E
7. http://www.khronos.org/opencl/resources
8. http://www.khronos.org/registry/cl/
9. opencl tutorials http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%201