QMCPACK
DelayedUpdate< T, T_FP > Class Template Reference

implements delayed update on CPU using BLAS More...

+ Collaboration diagram for DelayedUpdate< T, T_FP >:

Public Member Functions

 DelayedUpdate ()
 default constructor More...
 
void resize (int norb, int delay)
 resize the internal storage More...
 
template<typename TREAL >
void invert_transpose (const Matrix< T > &logdetT, Matrix< T > &Ainv, std::complex< TREAL > &log_value)
 compute the inverse of the transpose of matrix A More...
 
void initializeInv (const Matrix< T > &Ainv)
 initialize internal objects when Ainv is refreshed More...
 
int getDelayCount () const
 
template<typename VVT >
void getInvRow (const Matrix< T > &Ainv, int rowchanged, VVT &invRow)
 compute the row of up-to-date Ainv More...
 
template<typename VVT , typename RATIOT >
void acceptRow (Matrix< T > &Ainv, int rowchanged, const VVT &psiV, const RATIOT ratio_new)
 accept a move with the update delayed More...
 
void updateInvMat (Matrix< T > &Ainv)
 update the full Ainv and reset delay_count More...
 

Private Attributes

Matrix< T > U
 orbital values of delayed electrons More...
 
Matrix< T > V
 rows of Ainv corresponding to delayed electrons More...
 
Matrix< T > Binv
 Matrix inverse of B, at maximum KxK. More...
 
Matrix< T > tempMat
 scratch space, used during inverse update More...
 
Vector< T > temp
 temporal scratch space used by SM-1 More...
 
Vector< T > p
 new column of B More...
 
std::vector< int > delay_list
 list of delayed electrons More...
 
int delay_count
 current number of delays, increase one for each acceptance, reset to 0 after updating Ainv More...
 
DiracMatrix< T_FP > detEng
 matrix inversion engine More...
 

Detailed Description

template<typename T, typename T_FP>
class qmcplusplus::DelayedUpdate< T, T_FP >

implements delayed update on CPU using BLAS

Template Parameters
Tbase precision for most computation
T_FPhigh precision for matrix inversion, T_FP >= T

Definition at line 29 of file DelayedUpdate.h.

Constructor & Destructor Documentation

◆ DelayedUpdate()

DelayedUpdate ( )
inline

default constructor

Definition at line 52 of file DelayedUpdate.h.

52 : delay_count(0) {}
int delay_count
current number of delays, increase one for each acceptance, reset to 0 after updating Ainv ...
Definition: DelayedUpdate.h:46

Member Function Documentation

◆ acceptRow()

void acceptRow ( Matrix< T > &  Ainv,
int  rowchanged,
const VVT &  psiV,
const RATIOT  ratio_new 
)
inline

accept a move with the update delayed

Parameters
Ainvinverse matrix
rowchangedthe row id corresponding to the proposed electron
psiVnew orbital values

Before delay_count reaches the maximum delay, only Binv is updated with a recursive algorithm

Definition at line 125 of file DelayedUpdate.h.

References DelayedUpdate< T, T_FP >::Binv, Matrix< T, Alloc >::cols(), BLAS::cone, qmcplusplus::syclBLAS::copy_n(), BLAS::czero, Matrix< T, Alloc >::data(), Vector< T, Alloc >::data(), DelayedUpdate< T, T_FP >::delay_count, DelayedUpdate< T, T_FP >::delay_list, BLAS::gemv(), BLAS::ger(), DelayedUpdate< T, T_FP >::p, Matrix< T, Alloc >::rows(), DelayedUpdate< T, T_FP >::U, DelayedUpdate< T, T_FP >::updateInvMat(), and DelayedUpdate< T, T_FP >::V.

Referenced by qmcplusplus::TEST_CASE().

126  {
127  constexpr T cone(1);
128  constexpr T czero(0);
129  const int norb = Ainv.rows();
130  const int lda_Binv = Binv.cols();
131  std::copy_n(Ainv[rowchanged], norb, V[delay_count]);
132  std::copy_n(psiV.data(), norb, U[delay_count]);
133  delay_list[delay_count] = rowchanged;
134  // the new Binv is [[X Y] [Z sigma]]
135  BLAS::gemv('T', norb, delay_count + 1, -cone, V.data(), norb, psiV.data(), 1, czero, p.data(), 1);
136  // sigma
137  const T sigma = static_cast<T>(RATIOT(1) / ratio_new);
138  Binv[delay_count][delay_count] = sigma;
139  // Y
140  BLAS::gemv('T', delay_count, delay_count, sigma, Binv.data(), lda_Binv, p.data(), 1, czero,
141  Binv.data() + delay_count, lda_Binv);
142  // X
144  lda_Binv);
145  // Z
146  for (int i = 0; i < delay_count; i++)
147  Binv[delay_count][i] *= sigma;
148  delay_count++;
149  // update Ainv when maximal delay is reached
150  if (delay_count == lda_Binv)
151  updateInvMat(Ainv);
152  }
constexpr std::complex< float > czero
Definition: BLAS.hpp:51
constexpr std::complex< float > cone
Definition: BLAS.hpp:50
size_type cols() const
Definition: OhmmsMatrix.h:78
static void gemv(int n, int m, const double *restrict amat, const double *restrict x, double *restrict y)
Definition: BLAS.hpp:118
Matrix< T > Binv
Matrix inverse of B, at maximum KxK.
Definition: DelayedUpdate.h:36
std::vector< int > delay_list
list of delayed electrons
Definition: DelayedUpdate.h:44
static void ger(int m, int n, double alpha, const double *x, int incx, const double *y, int incy, double *a, int lda)
Definition: BLAS.hpp:437
void updateInvMat(Matrix< T > &Ainv)
update the full Ainv and reset delay_count
size_type rows() const
Definition: OhmmsMatrix.h:77
int delay_count
current number of delays, increase one for each acceptance, reset to 0 after updating Ainv ...
Definition: DelayedUpdate.h:46
sycl::event copy_n(sycl::queue &aq, const T1 *restrict VA, size_t array_size, T2 *restrict VC, const std::vector< sycl::event > &events)
Definition: syclBLAS.cpp:548
Matrix< T > U
orbital values of delayed electrons
Definition: DelayedUpdate.h:32
Matrix< T > V
rows of Ainv corresponding to delayed electrons
Definition: DelayedUpdate.h:34
Vector< T > p
new column of B
Definition: DelayedUpdate.h:42

◆ getDelayCount()

int getDelayCount ( ) const
inline

Definition at line 90 of file DelayedUpdate.h.

References DelayedUpdate< T, T_FP >::delay_count.

90 { return delay_count; }
int delay_count
current number of delays, increase one for each acceptance, reset to 0 after updating Ainv ...
Definition: DelayedUpdate.h:46

◆ getInvRow()

void getInvRow ( const Matrix< T > &  Ainv,
int  rowchanged,
VVT &  invRow 
)
inline

compute the row of up-to-date Ainv

Parameters
Ainvinverse matrix
rowchangedthe row id corresponding to the proposed electron

Definition at line 97 of file DelayedUpdate.h.

References DelayedUpdate< T, T_FP >::Binv, Matrix< T, Alloc >::cols(), BLAS::cone, qmcplusplus::syclBLAS::copy_n(), BLAS::czero, Matrix< T, Alloc >::data(), Vector< T, Alloc >::data(), DelayedUpdate< T, T_FP >::delay_count, BLAS::gemv(), DelayedUpdate< T, T_FP >::p, Matrix< T, Alloc >::rows(), Matrix< T, Alloc >::size(), DelayedUpdate< T, T_FP >::U, and DelayedUpdate< T, T_FP >::V.

Referenced by qmcplusplus::TEST_CASE().

98  {
99  if (delay_count == 0)
100  {
101  // Ainv is fresh, directly access Ainv
102  std::copy_n(Ainv[rowchanged], invRow.size(), invRow.data());
103  return;
104  }
105  constexpr T cone(1);
106  constexpr T czero(0);
107  const int norb = Ainv.rows();
108  const int lda_Binv = Binv.cols();
109  // save Ainv[rowchanged] to invRow
110  std::copy_n(Ainv[rowchanged], norb, invRow.data());
111  // multiply V (NxK) Binv(KxK) U(KxN) invRow right to the left
112  BLAS::gemv('T', norb, delay_count, cone, U.data(), norb, invRow.data(), 1, czero, p.data(), 1);
113  BLAS::gemv('N', delay_count, delay_count, -cone, Binv.data(), lda_Binv, p.data(), 1, czero, Binv[delay_count], 1);
114  BLAS::gemv('N', norb, delay_count, cone, V.data(), norb, Binv[delay_count], 1, cone, invRow.data(), 1);
115  }
constexpr std::complex< float > czero
Definition: BLAS.hpp:51
constexpr std::complex< float > cone
Definition: BLAS.hpp:50
size_type cols() const
Definition: OhmmsMatrix.h:78
static void gemv(int n, int m, const double *restrict amat, const double *restrict x, double *restrict y)
Definition: BLAS.hpp:118
Matrix< T > Binv
Matrix inverse of B, at maximum KxK.
Definition: DelayedUpdate.h:36
size_type rows() const
Definition: OhmmsMatrix.h:77
int delay_count
current number of delays, increase one for each acceptance, reset to 0 after updating Ainv ...
Definition: DelayedUpdate.h:46
sycl::event copy_n(sycl::queue &aq, const T1 *restrict VA, size_t array_size, T2 *restrict VC, const std::vector< sycl::event > &events)
Definition: syclBLAS.cpp:548
Matrix< T > U
orbital values of delayed electrons
Definition: DelayedUpdate.h:32
Matrix< T > V
rows of Ainv corresponding to delayed electrons
Definition: DelayedUpdate.h:34
Vector< T > p
new column of B
Definition: DelayedUpdate.h:42

◆ initializeInv()

void initializeInv ( const Matrix< T > &  Ainv)
inline

initialize internal objects when Ainv is refreshed

Parameters
Ainvinverse matrix

Definition at line 84 of file DelayedUpdate.h.

References DelayedUpdate< T, T_FP >::delay_count.

85  {
86  // safe mechanism
87  delay_count = 0;
88  }
int delay_count
current number of delays, increase one for each acceptance, reset to 0 after updating Ainv ...
Definition: DelayedUpdate.h:46

◆ invert_transpose()

void invert_transpose ( const Matrix< T > &  logdetT,
Matrix< T > &  Ainv,
std::complex< TREAL > &  log_value 
)
inline

compute the inverse of the transpose of matrix A

Parameters
logdetTorbital value matrix
Ainvinverse matrix

Definition at line 74 of file DelayedUpdate.h.

References DelayedUpdate< T, T_FP >::delay_count, and DelayedUpdate< T, T_FP >::detEng.

75  {
76  detEng.invert_transpose(logdetT, Ainv, log_value);
77  // safe mechanism
78  delay_count = 0;
79  }
DiracMatrix< T_FP > detEng
matrix inversion engine
Definition: DelayedUpdate.h:48
int delay_count
current number of delays, increase one for each acceptance, reset to 0 after updating Ainv ...
Definition: DelayedUpdate.h:46

◆ resize()

void resize ( int  norb,
int  delay 
)
inline

resize the internal storage

Parameters
norbnumber of electrons/orbitals
delay,maximumdelay 0<delay<=norb

Definition at line 58 of file DelayedUpdate.h.

References DelayedUpdate< T, T_FP >::Binv, DelayedUpdate< T, T_FP >::delay_list, DelayedUpdate< T, T_FP >::p, Matrix< T, Alloc >::resize(), Vector< T, Alloc >::resize(), DelayedUpdate< T, T_FP >::temp, DelayedUpdate< T, T_FP >::tempMat, DelayedUpdate< T, T_FP >::U, and DelayedUpdate< T, T_FP >::V.

Referenced by qmcplusplus::TEST_CASE().

59  {
60  V.resize(delay, norb);
61  U.resize(delay, norb);
62  p.resize(delay);
63  temp.resize(norb);
64  tempMat.resize(norb, delay);
65  Binv.resize(delay, delay);
66  delay_list.resize(delay);
67  }
void resize(size_type n, Type_t val=Type_t())
Resize the container.
Definition: OhmmsVector.h:166
void resize(size_type n, size_type m)
Resize the container.
Definition: OhmmsMatrix.h:99
Matrix< T > Binv
Matrix inverse of B, at maximum KxK.
Definition: DelayedUpdate.h:36
std::vector< int > delay_list
list of delayed electrons
Definition: DelayedUpdate.h:44
Vector< T > temp
temporal scratch space used by SM-1
Definition: DelayedUpdate.h:40
Matrix< T > tempMat
scratch space, used during inverse update
Definition: DelayedUpdate.h:38
Matrix< T > U
orbital values of delayed electrons
Definition: DelayedUpdate.h:32
Matrix< T > V
rows of Ainv corresponding to delayed electrons
Definition: DelayedUpdate.h:34
Vector< T > p
new column of B
Definition: DelayedUpdate.h:42

◆ updateInvMat()

void updateInvMat ( Matrix< T > &  Ainv)
inline

update the full Ainv and reset delay_count

Parameters
Ainvinverse matrix

Definition at line 157 of file DelayedUpdate.h.

References DelayedUpdate< T, T_FP >::Binv, Matrix< T, Alloc >::cols(), BLAS::cone, BLAS::czero, Matrix< T, Alloc >::data(), Vector< T, Alloc >::data(), DelayedUpdate< T, T_FP >::delay_count, DelayedUpdate< T, T_FP >::delay_list, BLAS::gemm(), BLAS::gemv(), BLAS::ger(), getNextLevelNumThreads(), omptarget::min(), BlasThreadingEnv::NestedThreadingSupported(), Matrix< T, Alloc >::rows(), DelayedUpdate< T, T_FP >::temp, DelayedUpdate< T, T_FP >::tempMat, DelayedUpdate< T, T_FP >::U, and DelayedUpdate< T, T_FP >::V.

Referenced by DelayedUpdate< T, T_FP >::acceptRow().

158  {
159  if (delay_count == 0)
160  return;
161  // update the inverse matrix
162  constexpr T cone(1);
163  constexpr T czero(0);
164  const int norb = Ainv.rows();
165  if (delay_count == 1)
166  {
167  // this is a special case invoking the Fahy's variant of Sherman-Morrison update.
168  // Only use the first norb elements of tempMat as a temporal array
169  BLAS::gemv('T', norb, norb, cone, Ainv.data(), norb, U[0], 1, czero, temp.data(), 1);
170  temp[delay_list[0]] -= cone;
171  BLAS::ger(norb, norb, -Binv[0][0], V[0], 1, temp.data(), 1, Ainv.data(), norb);
172  }
173  else
174  {
175  const int lda_Binv = Binv.cols();
176  // number of threads at the next level, forced to 1 if the problem is small.
177  const int num_threads = (norb < 256 ? 1 : getNextLevelNumThreads());
178  if (num_threads == 1 || BlasThreadingEnv::NestedThreadingSupported())
179  {
180  // threading depends on BLAS
181  BlasThreadingEnv knob(num_threads);
182  BLAS::gemm('T', 'N', delay_count, norb, norb, cone, U.data(), norb, Ainv.data(), norb, czero, tempMat.data(),
183  lda_Binv);
184  for (int i = 0; i < delay_count; i++)
185  tempMat(delay_list[i], i) -= cone;
186  BLAS::gemm('N', 'N', norb, delay_count, delay_count, cone, V.data(), norb, Binv.data(), lda_Binv, czero,
187  U.data(), norb);
188  BLAS::gemm('N', 'N', norb, norb, delay_count, -cone, U.data(), norb, tempMat.data(), lda_Binv, cone,
189  Ainv.data(), norb);
190  }
191  else
192  {
193  // manually threaded version of the above GEMM calls
194 #pragma omp parallel
195  {
196  const int block_size = getAlignedSize<T>((norb + num_threads - 1) / num_threads);
197  int num_block = (norb + block_size - 1) / block_size;
198 #pragma omp for
199  for (int ix = 0; ix < num_block; ix++)
200  {
201  int x_offset = ix * block_size;
202  BLAS::gemm('T', 'N', delay_count, std::min(norb - x_offset, block_size), norb, cone, U.data(), norb,
203  Ainv[x_offset], norb, czero, tempMat[x_offset], lda_Binv);
204  }
205 #pragma omp master
206  for (int i = 0; i < delay_count; i++)
207  tempMat(delay_list[i], i) -= cone;
208 #pragma omp for
209  for (int iy = 0; iy < num_block; iy++)
210  {
211  int y_offset = iy * block_size;
212  BLAS::gemm('N', 'N', std::min(norb - y_offset, block_size), delay_count, delay_count, cone,
213  V.data() + y_offset, norb, Binv.data(), lda_Binv, czero, U.data() + y_offset, norb);
214  }
215 #pragma omp for collapse(2) nowait
216  for (int iy = 0; iy < num_block; iy++)
217  for (int ix = 0; ix < num_block; ix++)
218  {
219  int x_offset = ix * block_size;
220  int y_offset = iy * block_size;
221  BLAS::gemm('N', 'N', std::min(norb - y_offset, block_size), std::min(norb - x_offset, block_size),
222  delay_count, -cone, U.data() + y_offset, norb, tempMat[x_offset], lda_Binv, cone,
223  Ainv[x_offset] + y_offset, norb);
224  }
225  }
226  }
227  }
228  delay_count = 0;
229  }
constexpr std::complex< float > czero
Definition: BLAS.hpp:51
constexpr std::complex< float > cone
Definition: BLAS.hpp:50
T min(T a, T b)
size_type cols() const
Definition: OhmmsMatrix.h:78
static void gemv(int n, int m, const double *restrict amat, const double *restrict x, double *restrict y)
Definition: BLAS.hpp:118
Matrix< T > Binv
Matrix inverse of B, at maximum KxK.
Definition: DelayedUpdate.h:36
std::vector< int > delay_list
list of delayed electrons
Definition: DelayedUpdate.h:44
static void ger(int m, int n, double alpha, const double *x, int incx, const double *y, int incy, double *a, int lda)
Definition: BLAS.hpp:437
size_type rows() const
Definition: OhmmsMatrix.h:77
Vector< T > temp
temporal scratch space used by SM-1
Definition: DelayedUpdate.h:40
int delay_count
current number of delays, increase one for each acceptance, reset to 0 after updating Ainv ...
Definition: DelayedUpdate.h:46
Matrix< T > tempMat
scratch space, used during inverse update
Definition: DelayedUpdate.h:38
Matrix< T > U
orbital values of delayed electrons
Definition: DelayedUpdate.h:32
static void gemm(char Atrans, char Btrans, int M, int N, int K, double alpha, const double *A, int lda, const double *restrict B, int ldb, double beta, double *restrict C, int ldc)
Definition: BLAS.hpp:235
Matrix< T > V
rows of Ainv corresponding to delayed electrons
Definition: DelayedUpdate.h:34
int getNextLevelNumThreads()
get the number of threads at the next parallel level
Definition: OpenMP.h:35

Member Data Documentation

◆ Binv

◆ delay_count

int delay_count
private

◆ delay_list

std::vector<int> delay_list
private

◆ detEng

DiracMatrix<T_FP> detEng
private

matrix inversion engine

Definition at line 48 of file DelayedUpdate.h.

Referenced by DelayedUpdate< T, T_FP >::invert_transpose().

◆ p

◆ temp

Vector<T> temp
private

temporal scratch space used by SM-1

Definition at line 40 of file DelayedUpdate.h.

Referenced by DelayedUpdate< T, T_FP >::resize(), and DelayedUpdate< T, T_FP >::updateInvMat().

◆ tempMat

Matrix<T> tempMat
private

scratch space, used during inverse update

Definition at line 38 of file DelayedUpdate.h.

Referenced by DelayedUpdate< T, T_FP >::resize(), and DelayedUpdate< T, T_FP >::updateInvMat().

◆ U

◆ V

Matrix<T> V
private

The documentation for this class was generated from the following file: