using the EMAN2 CUDA api
EMAN2 includes support for CUDA processing. To use CUDA in EMAN2 you must set the flag ENABLE_EMAN2_CUDA using ccmake, then recompile. This step defines the identifier ENABLE_EMAN2_CUDA causing the preprocessor to demarcate CUDA code for compilation. Any new CUDA code should be enclosed by #ifdef ENABLE_EMAN2_CUDA, #endif directives to restrict its complication. Compiling with CUDA exposes addition methods and members of the class EMData. Below is a list of addition EMData methods with python bindings.
bool EMData::copy_to_cuda() const, this copies EMData data from the CPU to the GPU global memory. Python method = copy_to_cuda()
bool EMData::copy_to_cudaro() const, this copies EMData data from the CPU to the GPU texture memory. Python method = copy_to_cudaro()
bool EMData::copy_rw_to_ro() const, this copies EMData data from global memory to texture memory. Python method = copy_rw_to_ro()
void EMData::switchoncuda(), this tells EMAN2 to use CUDA, you almost never want to call this function directly anymore. Use cuda_initialize instead. Python method = switchoncuda()
void EMData::switchoffcuda(), this tells EMAN2 to stop using CUDA. Python method = switchoffcuda()
bool EMData::cuda_initialize(), this tells EMAN2 to initialize CUDA and start using CUDA. This method needs to be called before CUDA is used. Python method = cuda_initialize()
void EMData::cuda_cleanup(), this cleans up CUDA cache and is called by an event handler in the EMAN2 module. You should nver call this function unless you intend to shut down a EMAN2 program. Python method = cuda_cleanup()
const char* EMData::getcudalock(), this returns a CUDA lock file. CUDA lock files are created to help the system keep track of what process is using what device. Insanely this functionality is not built into the CUDA API. CUDA lock files are stored in /tmp (Yes this will not work for WIndows, but neither will CUDA either). Python method = getcudalock()
If you are writing new C++ code, you will have access to and want to use these additional EMData CUDA methods:
float* getcudarwdata() const, returns a pointer to data in the global GPU memory. If null, 0 is returned
float* getcudarodata() const, returns a pointer to data in the texture GPU memory. If null, 0 is returned
bool EMData::isrodataongpu() const, returns True if data is in the GPU texture memory. Also returns True is data is in the global memory AND it succefuuly copied memory from global to texture. Other wise False is returned
bool EMData::usecuda, this member acts as a flag to signal when CUDA is being used. You should enclose all CUDA code in the braces: if(EMData::usecuda ==1){......}
In addition to the above functions, the following methods are used internally to implement the CUDA memory management scheme, which uses a least frequently used algorithm. When an EMData object data array is copied to the GPU it goes on top of a static linked list(there is only one linked list whose beginning and ending pointers are static). Additional items moved to the GPU go on the top of the list. When an item is accessed it is moved to the top of the list. If GPU memory runs out items are removed from the bottom of the linked list. These items will be the ones least frequently used, as EMData objects filter down to the bottom. The following methods implement this memory management scheme. Both global and texture memory is mamaged (the concept of texture memory makes more sense for openGL programmers).
bool EMData::rw_alloc() const, this method allocates GPU global memory sufficient to store the EMData object data. If allocation fails the method returns False, otherwise True
bool EMData::ro_alloc() const, this method allocates GPU texture memory sufficient to store the EMData object data. If allocation fails the method returns False, otherwise True
void EMData::bindcudaarrayA(const bool intp_mode) const, bind current texture memory to GPU device. After binding, texture memory can be accessed using texture object 'texA'. It is possible to have only 2 textures bound at any one time, A and B. The argument intp_mode specifies whether or not linear interpolation is desired. Many graphics cards have hardware support for this, hence speedups can be immense.
void EMData::bindcudaarrayB(const bool intp_mode) const, bind current texture memory to GPU device. After binding, texture memory can be accessed using texture object 'texB'. It is possible to have only 2 textures bound at any one time, A and B. The argument intp_mode specifies whether or not linear interpolation is desired. Many graphics cards have hardware support for this, hence speedups can be immense.
void EMData::unbindcudaarryA() const, unbind GPU texture 'A' from texture object 'texA'.
void EMData::unbindcudaarryB() const, unbind GPU texture 'B' from texture object 'texB'.
bool EMData::copy_from_device(const bool rocpy), copy data from GPU to CPU. The argument rocpy determines the type of memory copied from. If set to False(default) global memory is copied from, if set to True, texture memory is copied from.
void EMData::rw_free() const, free GPU global memory. This removes the EMData object from the linked list provided texture memory is not in use.
void EMData::ro_free() const, free GPU texture memory. This removes the EMData object from the linked list provided global memory is not in use.
bool EMData::freeup_devicemem(const int& num_bytes) const request to free up 'num_bytes' of memory on the GPU. If 'num_bytes' are already available, this method returns True. If not, then items are removed from the bottom of the linked list until enough memory is available. If enough memory cannot be made availble, then the method returns False.
void EMData::setdirtybit() const This sets a flag denoting that the data on the GPU has changed relative to the CPU. This strategy is not currently in use and the default is to always copy data from GPU to CPU irrespective of whether or not GPU data has changed.
void EMData::elementaccessed() const this method moves the EMData object to the top of the linked list.
void EMData::addtolist() const this method adds this EMData object onto the linked list top.
void EMData::removefromlist() const this method removes this EMData object from the linked list.
Writing CUDA for EMAN2 programs
CUDA code for EMAN2 is located in libEM/cuda. Please review this code for examples on CUDA programming in EMAN2./ I also recommend reading programming massively parallel processors if you are unfamiliar with CUDA. As of now, only 3D processing is CUDA enabled. This includes 3D aligners, Fourier reconstruction algorithms, and projection algorithms. Please review this code for EMAN2 cuda examples.