OpenCL: .NET, C# and Resolver One integration -- the very beginnings

About

Contact

OpenCL: .NET, C# and Resolver One integration -- the very beginnings

Posted on 18 March 2010 in GPU Computing, Python, Resolver One, TIL deep dives

Today I wrote the code required to call part of the OpenCL API from Resolver One; just one function so far, and all it does is get some information about your hardware setup, but it was great to get it working. There are already .NET bindings for OpenCL, but I felt that it was worthwhile reinventing the wheel -- largely as a way of making sure I understood every spoke, but also because I wanted the simplest possible API, with no extra code to make it more .NETty. It should also work as an example of how you can integrate a C library into a .NET/IronPython application like Resolver One.

I'll be documenting the whole thing when it's a bit more finished, but if you want to try out the work in progress, and are willing to build the interop code, here's how:

Make sure you have OpenCL installed -- here's the NVIDA OpenCL download page, and here's the OpenCL page for ATI. I've only tested this with NVIDIA so far, so I'm keen to hear of any incompatibilities.
Clone the dot-net-opencl project from Resolver Systems' GitHub account.
Load up the DotNetOpenCL.sln project file in the root of the project using Visual C# 2008 (here's the free "Express" version if you don't have it already).
Build the project
To try it out from IronPython, run ipy test_clGetPlatformIDs.py
To try it in Resolver One, load test_clGetPlatformIDs.rsl

That should be it! If you want to look at the code, the only important bit is in DotNetOpenCL.cs -- and it's simply an external method definition... the tricky bit was in working out which OpenCL function to write an external definition for, and what that definition should look like.

I've put a slightly tidied version of the notes I kept as I implemented this below, for posterity's sake; if you're interested in finding out how the implementation went, read on...

My aim for the day was to successfully call at least one OpenCL function from a Resolver One spreadsheet. When I last looked at OpenCL, I'd managed to get the NVIDIA demos compiling and running on my machine, and so the obvious starting point was to port one of these over. The simplest appeared to be oclDeviceQuery, which just queries the OpenCL devices on the computer and prints out their details, so I started by looking at that and trying to work out how it worked.

In the C++ project that NVIDIA supply for this demo, there's just one file: oclDeviceQuery.cpp. It has no explicit library dependencies (or at least none I could find), and depends on a few header files in the directory structure above the project. It looked simple enough, so I decided to go through all of the code and make sure I understood it all.

Stripped of logging, here's what it does:

The first step is to call oclGetPlatformID. This is a slightly unpromising start, as this function doesn't appear in the OpenCL specification. However, it is defined in oclUtils.cpp, which is part of the demo code NVIDIA provide, and it appears to be an NVIDIA-specific helper function. Looking at its source, it:
- Asks OpenCL for how many "platforms" the current machine supports, using clGetPlatformIDs(0, NULL, &num_platforms)
- Creates an array to hold the appropriate number of cl_platform_ids
- Calls clGetPlatformIDs(num_platforms, clPlatformIDs, NULL) to get fill the array with cl_platform_ids. (This "call a function once to get the number of foos and then call it with different parameters to get the list of foos" pattern is something I've seen in OpenGL too. It looks ugly to me, but perhaps I've been working with garbage-collected languages for too long.)
- Iterates over the platform IDs, and calls clGetPlatformInfo(clPlatformIDs[i], CL_PLATFORM_NAME, 1024, &chBuffer, NULL) to (presumably) get the platform name for each one; if the platform name is "NVIDIA", it puts that into the "out parameter" that it's returning.
- If, once the iteration is done, there's nothing in the out parameter, then it puts the zeroth platform ID into the out parameter. So, basically what this means is that oclGetPlatformID is an NVIDIA utility function that looks at all of the platforms your machine has, returns the NVIDIA one if there is one and returns whatever the zeroth one is if there isn't one.
Having worked all of this out, I decided that it was time to work out what OpenCL meant by a "platform"... In the spec, section 2, glossary, it says:

Platform: The host plus a collection of devices managed by the OpenCL framework that allow an application to share resources and execute kernels on devices in the platform.

Hmmm. Not enormously helpful. What's a host?

Host: The host interacts with the context using the OpenCL API.

That didn't really get me much further. How about a device?

Device: A device is a collection of compute units. A command-queue is used to queue commands to a device. Examples of commands include executing kernels, or reading and writing memory objects. OpenCL devices typically correspond to a GPU, a multi-core CPU, and other processors such as DSPs and the Cell/B.E. processor.

Right. So, to a first approximation (and understanding that I'll be revising this later) a device is an abstraction that ~= a graphics chip. It sounds like a platform ~= a graphics card, perhaps it could also be two bonded graphics cards working together. I would expect to find just one of them on my machine, anyway. The important thing is that we need to get a handle to it as the starting point for running OpenCL code. This makes sense. If you want to run GPGPU code, then the first thing to do is to get a handle to the GPU.

Once I'd reached this point, it was pretty clear that my task for the day had been clarified to "get clGetPlatformIDs working from Resolver One". However, I decided to work through the rest of the oclDeviceQuery demo code now, as getting the rest of it working is obviously what I'm going to have to do next! If you want to skip that, scroll down for my notes on how I handled clGetPlatformIDs.
Once control comes back into the oclDeviceQuery demo from oclGetPlatformID, the next step is to print out the platform's name and version. It uses the same oclGetPlatformInfo function as it did inside oclDeviceQuery to get each of them, passing in CL_PLATFORM_NAME, CL_PLATFORM_VERSION as field identifiers.
The next step is to get all of the devices attached to our platform. The code follows a similar pattern as was used for getting the list of platforms; clGetDeviceIDs(clSelectedPlatformID, CL_DEVICE_TYPE_ALL, 0, NULL, &ciDeviceCount) to get the total number of devices, allocate an array for them, and then clGetDeviceIDs(clSelectedPlatformID, CL_DEVICE_TYPE_ALL, ciDeviceCount, devices, &ciDeviceCount) to fill the array.
If the list of devices was correctly loaded, the next step is to iterate over them all, calling oclPrintDevInfo on each. oclPrintDevInfo is another NVIDIA utility function, and it simply calls a function clGetDeviceInfo with a variety of different parameters, getting and printing things like the device's number of compute units, or what extensions it supports.
Finally, the demo calls some #ifdefed OS-dependent code to print out system information; nothing terribly exciting there.

So, that's what oclDeviceQuery.cpp does. So now I knew that once clGetPlatformIDs is working in Resolver One, the next functions to add will be clGetPlatformInfo, clGetDeviceIDs, and clGetDeviceInfo.

On to getting the integration working for clGetPlatformIDs, then. In order to call this from Resolver One, I need to be able to call it from IronPython. In order to call it from IronPython, I need to call it from .NET, which is likely best-done using P/Invoke, the .NET interface for calling "platform" (ie. non-.NET) functions.

First step is to have somewhere to keep the code. I figured that an open source library would be reasonable, so created the dot-net-opencl project under the Resolver Systems GitHub account, then branched it under my own account to work on it.

Right, step 1: create a C# project to hold the P/Invoke code. I started by kicking off Visual C# 2008 Express, created a project DotNetOpenCL with an appropriate class name (CL for now -- so we'll have CL.clGetDeviceInfo, but that's no huge deal -- it can be tidied later).

Next step: Visual Studio projects generate an awful lot of junk files, and of course have build products -- DLLs and so on -- none of which you would want in the git repository. I added a .gitignore file as per this excellent Stack Overflow page.

The next step: add a first cut of the the clGetPlatformIDs function, just with the ability to get the number of platforms. Following the example of the P/Invoke stuff in Resolver One's testing framework (which uses native functions to move the mouse etc) I wound up with:

[DllImport("OpenCL.dll")]
public static extern int clGetPlatformIDs(uint num_entries, IntPtr platforms, out uint num_platforms);

Note that this is a bit dodgy; platforms should be a cl_platform_id *, where cl_platform_id is a struct _cl_platform_id * -- in other words, instead of an IntPtr it's actually an IntPtrPtr, which type doesn't exist in .NET. However, if all we're getting is the number of platforms, that should be OK for now.

Building gets a "Build succeeded" message. In DotNetOpenCL\DotNetOpenCL\bin\Release, I got a DLL file.

So, the next step: write a little IronPython script to test it.

from os.path import abspath, dirname, join
dllPath = join(dirname(abspath(__file__)), "DotNetOpenCL", "DotNetOpenCL", "bin", "Release")

import sys
sys.path.append(dllPath)

import clr
clr.AddReference("DotNetOpenCL")

from DotNetOpenCL import CL
from System import IntPtr

errorCode, numPlatforms = CL.clGetPlatformIDs(0, IntPtr.Zero)
print "Error code", errorCode
print "Number of platforms", numPlatforms

I ran it, and got:

giles@MRLEE /c/dev/dot-net-opencl
$ ipy test_clGetPlatformIDs.py
Error code 0
Number of platforms 1

This was really really pleasing: the code had called through to OpenCL from IronPython! Next step: how to deal with getting the number of platforms?

With help from this MSDN P/Invoke tutorial, it became obvious that you can tell P/Invoke that you want to pass a function a .NET array which can be treated as a pointer to a C-style array of pointers on the unmanaged side by using the MarshalAs attribute, for which the unmanaged type is specified using the appropriately-named UnmanagedTypeenumeration. It was pretty clear that the correct P/Invoke specification was:

[DllImport("OpenCL.dll")]
public static extern int clGetPlatformIDs(
        uint num_entries,
        [MarshalAs(UnmanagedType.LPArray)] IntPtr[] platforms,
        out uint num_platforms);

(For future reading: I also found this very in-depth page about P/Invoke on the Mono site, which looks like it will be useful later.)

This compiled OK, so I modified my IronPython testing code (with a reminder of how to create C# arrays from IP from Haibo Luo):

from System import Array, IntPtr

errorCode, numPlatforms = CL.clGetPlatformIDs(0, None)
print "Error code", errorCode
print "Number of platforms", numPlatforms

platforms = Array.CreateInstance(IntPtr, numPlatforms)
errorCode, numPlatformsReturned = CL.clGetPlatformIDs(numPlatforms, platforms)
print "Number of platforms returned", numPlatformsReturned
print "Platform array", platforms
for i in range(numPlatformsReturned):
    print "Platform #%s: %s" % (i, platforms[i])

I ran it, and got this:

giles@MRLEE /c/dev/dot-net-opencl
$ ipy test_clGetPlatformIDs.py
Error code 0
Number of platforms 1
Number of platforms returned 1
Platform array Array[IntPtr]((<System.IntPtr object at 0x000000000000002B [12289
]>))
Platform #0: 12289

w00t!

So by this stage I had an IronPython file that was successfully calling OpenCL and getting the list of platform IDs (which appear to be intended to be opaque types) into a .NET array type. The next step was to write a Resolver One spreadsheet that basically just cloned the functionality of the IronPython script. This was trivially easy, of course :-) So you can see the results in the GitHub repo.

That was it for today! Now that I have the set of functions I need clearly defined, and one of them working properly, then the rest should be pretty simple. I'll look at completing the oclDeviceQuery demo in my next OpenCL post.