by Denis Robilliard, Virginie Marion-Poty, Cyril Fonlupt - Université du Littoral-Côte d'Opale, Calais, France.
This work was supported by European InterregIIIA, 182b project
This web page presents a package named GPURegression that implements a population parallel scheme for Genetic Programming where the evaluation of individuals is performed on a NVidia G80 graphics processing unit. This parallel scheme was presented at the 11th European Conference on Genetic Programming (Euro'GP 2008, Naples, Italy), and the paper can be downloaded here.
The GPURegression package is based on Sean Luke's ECJ library, it allows to benefit from the high computing power of modern parallel graphics hardware based on the Nvidia G80 graphics processors family (e.g. GeForce 8x00 graphics card series). GP individuals are evaluated in parallel on the graphics card using an interpreter written in the Cuda language, while the breeding and selection is done on the CPU in the standard ECJ framework.
On a 8800 GTX card, speedups up to 60 times faster than ECJ running on an Intel Core 2 6600 @ 2.40 GHz can be observed for regression problem, this amounts up to 770 million GP operations per second.
Announcement: A new version, more than 3 times faster (up to 2.8 billion GPop/s) with more tutorial problems, allowing larger populations, is scheduled for September 2008.
Download and install the ECJ library, version 18.
Download and install the CUDA Driver, Toolkit and SDK, as explained on the NVidia website (make sure that you get a recent driver).
Download the GPURegression package and decompress it in the ECJ application directory, this will create a directory named gpuregression.
Add this new application directory to the main ECJ Makefile in ECJ root directory, (typically add a line ec/app/gpuregression/*.java\ and ec/app/gpuregression/func/*.java\ under the DIRS = \ line in the Makefile) , and execute the make command, as the java classes must be generated before step 7.
If you have more than one G80 card, edit the regression.cu file and change the parameter of the cudaSetDevice() call, in order to suit with the number of the graphics card that you want to target.
Verify the path to the Cuda SDK in the Makefile in the gpuregression directory, and adapt the "javah" invocation to suit your configuration (the default provided is the Sun SDK "javah" syntax)
Run make in the gpuregression directory to compile the ".cu" files associated to the tutorial problem into a library.
Go to the ECJ root directory and run the tutorial problem via the .params file, specifying the library path for both the Cuda runtime and the problem dependent library, e.g. with Sun SDK java:
java -Djava.library.path=/usr/local/cuda/lib/:ec/app/gpuregression -cp ./ ec.Evolve -file ec/app/gpuregression/cudaregression.params
The interpreter is a stack based postfix interpreter (also known as Reverse Polish Notation or RPN), that is operand(s) are read first and pushed on the stack, then the operator is read and interpreted, popping its argument(s) and pushing the result on the stack.
Breeding/evolutionary operators are performed as usual in ECJ, then evaluation is managed by class CudaEvaluator. First the GP individuals are parsed and translated into RPN and they are copied in a contiguous chunk of memory. Then the evalPopChunkGPU method of the problem class is called (e.g. CudaRegression.java). This method transfers the population, fitness cases and related data to the host cuda code (e.g. regression.cu) using the Java Native Interface (JNI). The host cuda code performs the actual transfer to the GPU and calls the interpreter (e.g. regressionKernel.cu).
In case you define your own functions (in the func subdirectory), the classes must define a public float postFixed() method that returns the opcode associated to the function. The opcode value should be defined in GPUCommonDefs.java and the opcode must of course be processed by the cuda interpreter in file regressionKernel.cu.
The package works only with graphics card in the NVidia G80 family (e.g. 8800GTX, 8600 GT, ...)
The code is written using single precision (float) type.
The size of the population is limited to 65535 individuals.
The size of the whole population array must be specified in the parameter file (parameter eval.problem.progssize) and thus must be roughly evaluated before the run.
The package does not support multi-threading for the ECJ/Java part.
Only one subpopulation is allowed.
Old graphics driver versions may/will halt with a core dump, or hang, or can even hang the system if you run your experiments on the same graphics card that run the system desktop. Recent driver are much more stable.
If experiments are run on the same graphics card that runs the system desktop, calculations that last for more than 5 seconds will be killed by the desktop watchdog. However, unless the fitness function is very costly, this is unlikely to happen because each generation is computed as an independent GPU call, and thus usually lasts less than 5 seconds.
Memory requirements are increased, as we need to store a contiguous copy of the whole population before transferring it to the GPU memory. Thus you may need to increase the Java Virtual Machine available memory (e.g. -Xms1000M -Xmx1000M command line parameters for the Sun JVM).
This code is distributed on an "as is" basis, without any warranty of suitability for any problem. In particular, the code is not well documented.
Disclaimer : all registered brand and products name mentioned on this web page are owned by their respective proprietors.