![]() |
QMCPACK
|
implements delayed update on NVIDIA GPU using cuBLAS and cusolverDN More...
Public Member Functions | |
DelayedUpdateCUDA () | |
default constructor More... | |
void | resize (int norb, int delay) |
resize the internal storage More... | |
template<typename TREAL > | |
void | invert_transpose (const Matrix< T > &logdetT, Matrix< T > &Ainv, std::complex< TREAL > &log_value) |
compute the inverse of the transpose of matrix A and its determinant value in log More... | |
void | initializeInv (const Matrix< T > &Ainv) |
initialize internal objects when Ainv is refreshed More... | |
int | getDelayCount () const |
template<typename VVT > | |
void | getInvRow (const Matrix< T > &Ainv, int rowchanged, VVT &invRow) |
compute the row of up-to-date Ainv More... | |
template<typename VVT , typename RATIOT > | |
void | acceptRow (Matrix< T > &Ainv, int rowchanged, const VVT &psiV, const RATIOT ratio_new) |
accept a move with the update delayed More... | |
void | updateInvMat (Matrix< T > &Ainv, bool transfer_to_host=true) |
update the full Ainv and reset delay_count More... | |
Private Member Functions | |
void | clearDelayCount () |
reset delay count to 0 More... | |
Private Attributes | |
Matrix< T, CUDAHostAllocator< T > > | U |
Matrix< T, CUDAHostAllocator< T > > | Binv |
Matrix< T > | V |
Matrix< T, CUDAAllocator< T > > | temp_gpu |
Matrix< T, CUDAAllocator< T > > | U_gpu |
GPU copy of U, V, Binv, Ainv. More... | |
Matrix< T, CUDAAllocator< T > > | V_gpu |
Matrix< T, CUDAAllocator< T > > | Binv_gpu |
Matrix< T, CUDAAllocator< T > > | Ainv_gpu |
Vector< T > | p |
Vector< int, CUDAHostAllocator< int > > | delay_list |
Vector< int, CUDAAllocator< int > > | delay_list_gpu |
int | delay_count |
current number of delays, increase one for each acceptance, reset to 0 after updating Ainv More... | |
cuSolverInverter< T_FP > | cusolver_inverter |
PrefetchedRange | prefetched_range |
Matrix< T, CUDAHostAllocator< T > > | Ainv_buffer |
compute::Queue< PlatformKind::CUDA > | queue_ |
compute::BLASHandle< PlatformKind::CUDA > | blas_handle_ |
implements delayed update on NVIDIA GPU using cuBLAS and cusolverDN
T | base precision for most computation |
T_FP | high precision for matrix inversion, T_FP >= T |
Definition at line 35 of file DelayedUpdateCUDA.h.
|
inline |
default constructor
Definition at line 79 of file DelayedUpdateCUDA.h.
|
inline |
accept a move with the update delayed
Ainv | inverse matrix |
rowchanged | the row id corresponding to the proposed electron |
psiV | new orbital values |
Before delay_count reaches the maximum delay, only Binv is updated with a recursive algorithm
Definition at line 173 of file DelayedUpdateCUDA.h.
References DelayedUpdateCUDA< T, T_FP >::Ainv_buffer, DelayedUpdateCUDA< T, T_FP >::Binv, BLAS::cone, qmcplusplus::syclBLAS::copy_n(), BLAS::czero, Matrix< T, Alloc >::data(), Vector< T, Alloc >::data(), DelayedUpdateCUDA< T, T_FP >::delay_count, DelayedUpdateCUDA< T, T_FP >::delay_list, BLAS::gemv(), BLAS::ger(), PrefetchedRange::getOffset(), DelayedUpdateCUDA< T, T_FP >::p, DelayedUpdateCUDA< T, T_FP >::prefetched_range, Matrix< T, Alloc >::rows(), DelayedUpdateCUDA< T, T_FP >::U, DelayedUpdateCUDA< T, T_FP >::updateInvMat(), and DelayedUpdateCUDA< T, T_FP >::V.
|
inlineprivate |
reset delay count to 0
Definition at line 71 of file DelayedUpdateCUDA.h.
References PrefetchedRange::clear(), DelayedUpdateCUDA< T, T_FP >::delay_count, and DelayedUpdateCUDA< T, T_FP >::prefetched_range.
Referenced by DelayedUpdateCUDA< T, T_FP >::initializeInv(), DelayedUpdateCUDA< T, T_FP >::invert_transpose(), and DelayedUpdateCUDA< T, T_FP >::updateInvMat().
|
inline |
Definition at line 131 of file DelayedUpdateCUDA.h.
References DelayedUpdateCUDA< T, T_FP >::delay_count.
|
inline |
compute the row of up-to-date Ainv
Ainv | inverse matrix |
rowchanged | the row id corresponding to the proposed electron |
Definition at line 138 of file DelayedUpdateCUDA.h.
References DelayedUpdateCUDA< T, T_FP >::Ainv_buffer, DelayedUpdateCUDA< T, T_FP >::Ainv_gpu, DelayedUpdateCUDA< T, T_FP >::Binv, PrefetchedRange::checkRange(), BLAS::cone, qmcplusplus::syclBLAS::copy_n(), qmcplusplus::cudaErrorCheck(), cudaMemcpyAsync, cudaMemcpyDeviceToHost, BLAS::czero, Matrix< T, Alloc >::data(), Vector< T, Alloc >::data(), DelayedUpdateCUDA< T, T_FP >::delay_count, BLAS::gemv(), Queue< PlatformKind::CUDA >::getNative(), PrefetchedRange::getOffset(), omptarget::min(), DelayedUpdateCUDA< T, T_FP >::p, DelayedUpdateCUDA< T, T_FP >::prefetched_range, DelayedUpdateCUDA< T, T_FP >::queue_, Matrix< T, Alloc >::rows(), PrefetchedRange::setRange(), Queue< PlatformKind::CUDA >::sync(), DelayedUpdateCUDA< T, T_FP >::U, and DelayedUpdateCUDA< T, T_FP >::V.
|
inline |
initialize internal objects when Ainv is refreshed
Ainv | inverse matrix |
Definition at line 121 of file DelayedUpdateCUDA.h.
References DelayedUpdateCUDA< T, T_FP >::Ainv_gpu, DelayedUpdateCUDA< T, T_FP >::clearDelayCount(), qmcplusplus::cudaErrorCheck(), cudaMemcpyAsync, cudaMemcpyHostToDevice, Matrix< T, Alloc >::data(), Queue< PlatformKind::CUDA >::getNative(), DelayedUpdateCUDA< T, T_FP >::queue_, Matrix< T, Alloc >::size(), and Queue< PlatformKind::CUDA >::sync().
|
inline |
compute the inverse of the transpose of matrix A and its determinant value in log
TREAL | real type |
Definition at line 108 of file DelayedUpdateCUDA.h.
References DelayedUpdateCUDA< T, T_FP >::Ainv_gpu, DelayedUpdateCUDA< T, T_FP >::clearDelayCount(), DelayedUpdateCUDA< T, T_FP >::cusolver_inverter, and rocSolverInverter< T_FP >::invert_transpose().
|
inline |
resize the internal storage
norb | number of electrons/orbitals |
delay,maximum | delay 0<delay<=norb |
Definition at line 85 of file DelayedUpdateCUDA.h.
References DelayedUpdateCUDA< T, T_FP >::Ainv_buffer, DelayedUpdateCUDA< T, T_FP >::Ainv_gpu, DelayedUpdateCUDA< T, T_FP >::Binv, DelayedUpdateCUDA< T, T_FP >::Binv_gpu, DelayedUpdateCUDA< T, T_FP >::delay_list, DelayedUpdateCUDA< T, T_FP >::delay_list_gpu, omptarget::min(), DelayedUpdateCUDA< T, T_FP >::p, Matrix< T, Alloc >::resize(), Vector< T, Alloc >::resize(), DelayedUpdateCUDA< T, T_FP >::temp_gpu, DelayedUpdateCUDA< T, T_FP >::U, DelayedUpdateCUDA< T, T_FP >::U_gpu, DelayedUpdateCUDA< T, T_FP >::V, and DelayedUpdateCUDA< T, T_FP >::V_gpu.
|
inline |
update the full Ainv and reset delay_count
Ainv | inverse matrix |
Definition at line 206 of file DelayedUpdateCUDA.h.
References DelayedUpdateCUDA< T, T_FP >::Ainv_gpu, applyW_stageV_cuda(), DelayedUpdateCUDA< T, T_FP >::Binv, DelayedUpdateCUDA< T, T_FP >::Binv_gpu, DelayedUpdateCUDA< T, T_FP >::blas_handle_, DelayedUpdateCUDA< T, T_FP >::clearDelayCount(), qmcplusplus::cudaErrorCheck(), cudaMemcpyAsync, cudaMemcpyDeviceToHost, cudaMemcpyHostToDevice, Matrix< T, Alloc >::data(), DelayedUpdateCUDA< T, T_FP >::delay_count, DelayedUpdateCUDA< T, T_FP >::delay_list, DelayedUpdateCUDA< T, T_FP >::delay_list_gpu, qmcplusplus::compute::BLAS::gemm(), Queue< PlatformKind::CUDA >::getNative(), DelayedUpdateCUDA< T, T_FP >::queue_, Matrix< T, Alloc >::rows(), Matrix< T, Alloc >::size(), Queue< PlatformKind::CUDA >::sync(), DelayedUpdateCUDA< T, T_FP >::temp_gpu, DelayedUpdateCUDA< T, T_FP >::U, DelayedUpdateCUDA< T, T_FP >::U_gpu, and DelayedUpdateCUDA< T, T_FP >::V_gpu.
Referenced by DelayedUpdateCUDA< T, T_FP >::acceptRow().
|
private |
Definition at line 64 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::acceptRow(), DelayedUpdateCUDA< T, T_FP >::getInvRow(), and DelayedUpdateCUDA< T, T_FP >::resize().
|
private |
|
private |
Definition at line 39 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::acceptRow(), DelayedUpdateCUDA< T, T_FP >::getInvRow(), DelayedUpdateCUDA< T, T_FP >::resize(), and DelayedUpdateCUDA< T, T_FP >::updateInvMat().
|
private |
Definition at line 46 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::resize(), and DelayedUpdateCUDA< T, T_FP >::updateInvMat().
|
private |
Definition at line 68 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::updateInvMat().
|
private |
Definition at line 58 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::invert_transpose().
|
private |
current number of delays, increase one for each acceptance, reset to 0 after updating Ainv
Definition at line 53 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::acceptRow(), DelayedUpdateCUDA< T, T_FP >::clearDelayCount(), DelayedUpdateCUDA< T, T_FP >::getDelayCount(), DelayedUpdateCUDA< T, T_FP >::getInvRow(), and DelayedUpdateCUDA< T, T_FP >::updateInvMat().
|
private |
Definition at line 50 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::acceptRow(), DelayedUpdateCUDA< T, T_FP >::resize(), and DelayedUpdateCUDA< T, T_FP >::updateInvMat().
|
private |
Definition at line 51 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::resize(), and DelayedUpdateCUDA< T, T_FP >::updateInvMat().
|
private |
Definition at line 49 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::acceptRow(), DelayedUpdateCUDA< T, T_FP >::getInvRow(), and DelayedUpdateCUDA< T, T_FP >::resize().
|
private |
Definition at line 62 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::acceptRow(), DelayedUpdateCUDA< T, T_FP >::clearDelayCount(), and DelayedUpdateCUDA< T, T_FP >::getInvRow().
|
private |
Definition at line 67 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::getInvRow(), DelayedUpdateCUDA< T, T_FP >::initializeInv(), and DelayedUpdateCUDA< T, T_FP >::updateInvMat().
|
private |
Definition at line 42 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::resize(), and DelayedUpdateCUDA< T, T_FP >::updateInvMat().
|
private |
Definition at line 38 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::acceptRow(), DelayedUpdateCUDA< T, T_FP >::getInvRow(), DelayedUpdateCUDA< T, T_FP >::resize(), and DelayedUpdateCUDA< T, T_FP >::updateInvMat().
|
private |
GPU copy of U, V, Binv, Ainv.
Definition at line 44 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::resize(), and DelayedUpdateCUDA< T, T_FP >::updateInvMat().
|
private |
Definition at line 40 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::acceptRow(), DelayedUpdateCUDA< T, T_FP >::getInvRow(), and DelayedUpdateCUDA< T, T_FP >::resize().
|
private |
Definition at line 45 of file DelayedUpdateCUDA.h.
Referenced by DelayedUpdateCUDA< T, T_FP >::resize(), and DelayedUpdateCUDA< T, T_FP >::updateInvMat().