OpenCL: first investigations with an NVIDIA card
I'm taking a look at OpenCL at the moment, with the vague intention of hooking it up to Resolver One. In case you've not heard about it, OpenCL is a language that allows you to do non-graphical computing on your graphics card (GPU). Because GPUs have more raw computational power than even modern CPUs, in the form of a large number of relatively slow stream processors, this can speed up certain kinds of calculations -- in particular, those that are amenable to massive parallelisation.
Until recently, the two main graphics card manufacturers had their own languages for this kind of general-purpose GPU computing; NVIDIA had CUDA, and ATI/AMD had their Stream technology. OpenCL was created as a way of having one language that would work on all graphics cards, so although the tools for developing using it are not currently as good as those for CUDA (which has been around for a long time and has great support), as a longer-term investment OpenCL looks to me like the best one to be looking at.
It took a little bit of work to get something up and running on my machine here at work, so it's probably worth documenting to help others who are trying the same.
I'm doing this using a machine with an NVIDIA GeForce 8600 GT graphics card; it's a bit old, but it can run CUDA, and (as you might expect) NVIDIA's OpenCL implementation is build on top of CUDA. So this description will probably only help people trying to get stuff working using NVIDIA cards. I have a laptop with an ATI card at home, and I'll try installing it all there some other time and write that up too.
Here's what I did, including mis-steps and error messages:
- Firstly, I obviously needed to download the appropriate drivers and libraries from NVIDIA. Here is their OpenCL download page. From there, I followed the "Click here to download OpenCL" link, gave them my details when asked, and then on the resulting page for the "NVIDIA Drivers for WinVista and Win7 (190.89)" 32-bit version.
- I installed the drivers. Windows warned that it couldn't verify they'd work, but I went ahead anyway. While installing, it did odd stuff including blanking the display and switching resolution a few times (unsuprisingly given that it's basically a new graphics driver) but seemed to succeed. It wanted to reboot, so I let it do so.
- When the machine came back, I found some PhysX demos on the start menu. PhysX is a separate but related NVIDIA product that allows games developers to use the graphics card to handle parts of their calculations -- for example, simulating realistic cloth. This looked like a good way to check the install had worked, so I tried running it. Unfortunately when I tried, it told me that I needed DirectX 9.0c and only had 9.0 installed. I checked the machine (which used to be used by someone else) and discovered that it hadn't been updated for a long time -- it didn't even have Vista Service Pack 1, which is two years old!
- So, I let Windows Update install everything it wanted to install (which took a few hours) and tried again. Unfortunately, I got the same error.
- A bit of Googling found Microsoft's page for downloading the latest versions of DirectX, so I ran that and tried again. This time it worked, and I was able to look at the PhysX demos; here's a video from someone else showing what they look like.
- Right, time for some real OpenCL. I downloaded "GPU Computing SDK code samples and more" from the NVIDIA site where I originally got the drivers, and installed it.
- It put an icon titled "NVIDIA GPU Computing SDK Browser" on the desktop, so I double-clicked it. A dialog came up saying "The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log for more detail." I decided not to worry about this; errors like that can be a pain to track down (it's usually a missing or misplaced DLL) and given that the app in question is just a browser for demos, and the stuff that was just installed was mostly the source code for those demos, it looked like a good plan to go straight to the source code.
- In
C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK\OpenCL\src\
, there were a number of directories, each including what appeared to be a Visual Studio project demonstrating some aspect of OpenCL. I took a closer look at theoclMatVecMul
subdirectory, and saw that it was a C++ project. I didn't have a C++ compiler installed on the machine, but... - Microsoft, in their infinite kindness, allow you to use the "Express" version of Visual C++ for free, so I downloaded it from here. For some reason it failed to install the first time I tried, but when I tried a gain (not doing anything in the meantime) it worked just fine. Hmm.
- Once it was installed, I opened the
oclMatVecMul_vc9.sln
with it. From theBuild
menu, I choseBuild Solution
. - Then from the
Debug
menu, the eccentrically-locatedStart Without Debugging
option. - The application failed, with a number of dialog boxes describing the problem.
Once I'd quit it, I could see a log window which had all of the text that had been
in the dialogs, all of which is listed below:
'oclMatVecMul.exe': Loaded 'C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK\OpenCL\bin\Win32\Debug\oclMatVecMul.exe', Symbols loaded. 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\ntdll.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\kernel32.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\OpenCL.dll', Binary was not built with debug information. 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\advapi32.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\rpcrt4.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\nvcuda.dll', Binary was not built with debug information. 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\user32.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\gdi32.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\imm32.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\msctf.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\msvcrt.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\lpk.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\usp10.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\avgrsstx.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\dwmapi.dll' 'oclMatVecMul.exe': Unloaded 'C:\Windows\System32\dwmapi.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\nvapi.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\ole32.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\oleaut32.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\shlwapi.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\shell32.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\setupapi.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\System32\version.dll' 'oclMatVecMul.exe': Loaded 'C:\Windows\winsxs\x86_microsoft.windows.common-controls_6595b64144ccf1df_6.0.6002.18005_none_5cb72f96088b0de0\comctl32.dll' Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention. Run-Time Check Failure #0 - The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention. First-chance exception at 0xcccccccc in oclMatVecMul.exe: 0xC0000005: Access violation reading location 0xcccccccc. Unhandled exception at 0xcccccccc in oclMatVecMul.exe: 0xC0000005: Access violation reading location 0xcccccccc. First-chance exception at 0xcccccccc in oclMatVecMul.exe: 0xC0000005: Access violation reading location 0xcccccccc. Unhandled exception at 0xcccccccc in oclMatVecMul.exe: 0xC0000005: Access violation reading location 0xcccccccc. First-chance exception at 0xcccccccc in oclMatVecMul.exe: 0xC0000005: Access violation reading location 0xcccccccc. Unhandled exception at 0xcccccccc in oclMatVecMul.exe: 0xC0000005: Access violation reading location 0xcccccccc. First-chance exception at 0xcccccccc in oclMatVecMul.exe: 0xC0000005: Access violation reading location 0xcccccccc. Unhandled exception at 0xcccccccc in oclMatVecMul.exe: 0xC0000005: Access violation reading location 0xcccccccc. First-chance exception at 0xcccccccc in oclMatVecMul.exe: 0xC0000005: Access violation reading location 0xcccccccc. Unhandled exception at 0xcccccccc in oclMatVecMul.exe: 0xC0000005: Access violation reading location 0xcccccccc. The program '[3984] oclMatVecMul.exe: Native' has exited with code 0 (0x0).
- Slightly dispirited, I tried a different sample,
oclDeviceQuery
, and got the same errors. - Looking at the errors more closely, I figured that it looked like the C++ code and its associated header files were incompatible with the libraries that were being linked in at runtime; the "This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention" in particular seemed to point in that direction. Given that I'd done a large-scale Windows Update shortly after installing the OpenCL drivers right at the start of this process, it seemed plausible that they might have been overwritten by older, non-compatible drivers.
- The best way of testing this hypothesis seemed to be to reinstall the drivers, so I went back to the NVIDIA download page and once again downloaded the the "NVIDIA Drivers for WinVista and Win7 (190.89)" 32-bit version, and reinstalled (exiting Visual Studio first).
- Once that was all done, I restarted Visual Studio, reloaded the
oclDeviceQuery
demo, rebuilt it, and ran it again... ...and it worked! The matrix multiplication one and the n-body physics simulation ones also worked, so I think everything's sorted. Here's a video of the latter (click the image to play):
If it looks slow to you, that's probably because it is a bit slow... but remember, it's doing a lot of calculations, and I'm using a pretty old and crappy graphics card.
So, that's how I got it all installed and running. Next time I'll write about something a little more interesting, like how the programs are structured, or even how to call OpenCL from IronPython applications like Resolver One.