QMCPACK
CUDAAllocator< T > Class Template Reference

allocator for CUDA device memory More...

+ Inheritance diagram for CUDAAllocator< T >:
+ Collaboration diagram for CUDAAllocator< T >:

Classes

struct  rebind
 

Public Types

using value_type = T
 
using size_type = size_t
 
using pointer = T *
 
using const_pointer = const T *
 

Public Member Functions

 CUDAAllocator ()=default
 
template<class U >
 CUDAAllocator (const CUDAAllocator< U > &)
 
T * allocate (std::size_t n)
 
void deallocate (T *p, std::size_t n)
 
void copyToDevice (T *device_ptr, T *host_ptr, size_t n)
 
void copyFromDevice (T *host_ptr, T *device_ptr, size_t n)
 
void copyDeviceToDevice (T *to_ptr, size_t n, T *from_ptr)
 

Static Public Member Functions

template<class U , class... Args>
static void construct (U *p, Args &&... args)
 Provide a construct for std::allocator_traits::contruct to call. More...
 
template<class U >
static void destroy (U *p)
 Give std::allocator_traits something to call. More...
 

Detailed Description

template<typename T>
class qmcplusplus::CUDAAllocator< T >

allocator for CUDA device memory

Template Parameters
Tdata type

using this with something other than Ohmms containers? – use caution, write unit tests! – It's not tested beyond use in some unit tests using std::vector with constant size. CUDAAllocator appears to meet all the nonoptional requirements of a c++ Allocator.

Some of the default implementations in std::allocator_traits of optional Allocator requirements may cause runtime or compilation failures. They assume there is only one memory space and that the host has access to it.

Definition at line 95 of file CUDAallocator.hpp.


Class Documentation

◆ qmcplusplus::CUDAAllocator::rebind

struct qmcplusplus::CUDAAllocator::rebind

template<typename T>
template<class U>
struct qmcplusplus::CUDAAllocator< T >::rebind< U >

Definition at line 109 of file CUDAallocator.hpp.

+ Collaboration diagram for CUDAAllocator< T >::rebind< U >:
Class Members
typedef CUDAAllocator< U > other

Member Typedef Documentation

◆ const_pointer

using const_pointer = const T*

Definition at line 101 of file CUDAallocator.hpp.

◆ pointer

using pointer = T*

Definition at line 100 of file CUDAallocator.hpp.

◆ size_type

using size_type = size_t

Definition at line 99 of file CUDAallocator.hpp.

◆ value_type

using value_type = T

Definition at line 98 of file CUDAallocator.hpp.

Constructor & Destructor Documentation

◆ CUDAAllocator() [1/2]

CUDAAllocator ( )
default

◆ CUDAAllocator() [2/2]

CUDAAllocator ( const CUDAAllocator< U > &  )
inline

Definition at line 105 of file CUDAallocator.hpp.

106  {}

Member Function Documentation

◆ allocate()

T* allocate ( std::size_t  n)
inline

Definition at line 114 of file CUDAallocator.hpp.

115  {
116  void* pt;
117  cudaErrorCheck(cudaMalloc(&pt, n * sizeof(T)), "Allocation failed in CUDAAllocator!");
118  CUDAallocator_device_mem_allocated += n * sizeof(T);
119  return static_cast<T*>(pt);
120  }
cudaErrorCheck(cudaMemcpyAsync(dev_lu.data(), lu.data(), sizeof(decltype(lu)::value_type) *lu.size(), cudaMemcpyHostToDevice, hstream), "cudaMemcpyAsync failed copying log_values to device")
std::atomic< size_t > CUDAallocator_device_mem_allocated
#define cudaMalloc
Definition: cuda2hip.h:119

◆ construct()

static void construct ( U *  p,
Args &&...  args 
)
inlinestatic

Provide a construct for std::allocator_traits::contruct to call.

Don't do anything on construct, pointer p is on the device!

For example std::vector calls this to default initialize each element. You'll segfault if std::allocator_traits::construct tries doing that at p.

The standard is a bit confusing on this point. Implementing this is an optional requirement of Allocator from C++11 on, its not slated to be removed.

Its deprecated for the std::allocator in c++17 and will be removed in c++20. But we are not implementing std::allocator.

STL containers only use Allocators through allocator_traits and std::allocator_traits handles the case where no construct method is present in the Allocator. But std::allocator_traits will call the Allocators construct method if present.

Definition at line 144 of file CUDAallocator.hpp.

145  {}

◆ copyDeviceToDevice()

void copyDeviceToDevice ( T *  to_ptr,
size_t  n,
T *  from_ptr 
)
inline

Definition at line 167 of file CUDAallocator.hpp.

168  {
169  cudaErrorCheck(cudaMemcpy(to_ptr, from_ptr, sizeof(T) * n, cudaMemcpyDeviceToDevice),
170  "cudaMemcpy failed in copyDeviceToDevice");
171  }
#define cudaMemcpy
Definition: cuda2hip.h:135
cudaErrorCheck(cudaMemcpyAsync(dev_lu.data(), lu.data(), sizeof(decltype(lu)::value_type) *lu.size(), cudaMemcpyHostToDevice, hstream), "cudaMemcpyAsync failed copying log_values to device")
#define cudaMemcpyDeviceToDevice
Definition: cuda2hip.h:137

◆ copyFromDevice()

void copyFromDevice ( T *  host_ptr,
T *  device_ptr,
size_t  n 
)
inline

Definition at line 161 of file CUDAallocator.hpp.

162  {
163  cudaErrorCheck(cudaMemcpy(host_ptr, device_ptr, sizeof(T) * n, cudaMemcpyDeviceToHost),
164  "cudaMemcpy failed in copyFromDevice");
165  }
#define cudaMemcpy
Definition: cuda2hip.h:135
cudaErrorCheck(cudaMemcpyAsync(dev_lu.data(), lu.data(), sizeof(decltype(lu)::value_type) *lu.size(), cudaMemcpyHostToDevice, hstream), "cudaMemcpyAsync failed copying log_values to device")
#define cudaMemcpyDeviceToHost
Definition: cuda2hip.h:138

◆ copyToDevice()

void copyToDevice ( T *  device_ptr,
T *  host_ptr,
size_t  n 
)
inline

Definition at line 155 of file CUDAallocator.hpp.

156  {
157  cudaErrorCheck(cudaMemcpy(device_ptr, host_ptr, sizeof(T) * n, cudaMemcpyHostToDevice),
158  "cudaMemcpy failed in copyToDevice");
159  }
#define cudaMemcpy
Definition: cuda2hip.h:135
cudaErrorCheck(cudaMemcpyAsync(dev_lu.data(), lu.data(), sizeof(decltype(lu)::value_type) *lu.size(), cudaMemcpyHostToDevice, hstream), "cudaMemcpyAsync failed copying log_values to device")
#define cudaMemcpyHostToDevice
Definition: cuda2hip.h:139

◆ deallocate()

void deallocate ( T *  p,
std::size_t  n 
)
inline

Definition at line 121 of file CUDAallocator.hpp.

122  {
123  cudaErrorCheck(cudaFree(p), "Deallocation failed in CUDAAllocator!");
124  CUDAallocator_device_mem_allocated -= n * sizeof(T);
125  }
cudaErrorCheck(cudaMemcpyAsync(dev_lu.data(), lu.data(), sizeof(decltype(lu)::value_type) *lu.size(), cudaMemcpyHostToDevice, hstream), "cudaMemcpyAsync failed copying log_values to device")
#define cudaFree
Definition: cuda2hip.h:99
std::atomic< size_t > CUDAallocator_device_mem_allocated

◆ destroy()

static void destroy ( U *  p)
inlinestatic

Give std::allocator_traits something to call.

The default if this isn't present is to call p->~T() which we can't do on device memory.

Definition at line 152 of file CUDAallocator.hpp.

153  {}

The documentation for this class was generated from the following file: