QMCPACK
DiracMatrixComputeOMPTarget< VALUE_FP > Class Template Reference

class to compute matrix inversion and the log value of determinant of a batch of DiracMatrixes. More...

+ Inheritance diagram for DiracMatrixComputeOMPTarget< VALUE_FP >:
+ Collaboration diagram for DiracMatrixComputeOMPTarget< VALUE_FP >:

Public Types

using FullPrecReal = RealAlias< VALUE_FP >
 
using LogValue = std::complex< FullPrecReal >
 
template<typename T >
using OffloadPinnedAllocator = OMPallocator< T, PinnedAlignedAllocator< T > >
 
template<typename T >
using OffloadPinnedMatrix = Matrix< T, OffloadPinnedAllocator< T > >
 
template<typename T >
using OffloadPinnedVector = Vector< T, OffloadPinnedAllocator< T > >
 
using HandleResource = compute::Queue< PlatformKind::OMPTARGET >
 

Public Member Functions

 DiracMatrixComputeOMPTarget ()
 
std::unique_ptr< ResourcemakeClone () const override
 
template<typename TMAT >
std::enable_if_t< std::is_same< VALUE_FP, TMAT >::value > invert_transpose (HandleResource &resource, const OffloadPinnedMatrix< TMAT > &a_mat, OffloadPinnedMatrix< TMAT > &inv_a_mat, LogValue &log_value)
 compute the inverse of the transpose of matrix A and its determinant value in log when VALUE_FP and TMAT are the same More...
 
template<typename TMAT >
std::enable_if_t<!std::is_same< VALUE_FP, TMAT >::value > invert_transpose (HandleResource &resource, const OffloadPinnedMatrix< TMAT > &a_mat, OffloadPinnedMatrix< TMAT > &inv_a_mat, LogValue &log_value)
 compute the inverse of the transpose of matrix A and its determinant value in log when VALUE_FP and TMAT are the different More...
 
template<typename TMAT , PlatformKind PL>
void mw_invertTranspose (compute::Queue< PL > &resource_ignored, const RefVector< const OffloadPinnedMatrix< TMAT >> &a_mats, const RefVector< OffloadPinnedMatrix< TMAT >> &inv_a_mats, OffloadPinnedVector< LogValue > &log_values)
 This covers both mixed and Full precision case. More...
 
- Public Member Functions inherited from Resource
 Resource (const std::string &name)
 
virtual ~Resource ()=default
 
const std::string & getName () const
 

Private Member Functions

void reset (OffloadPinnedVector< VALUE_FP > &psi_Ms, const int n, const int lda, const int batch_size)
 reset internal work space. More...
 
void reset (OffloadPinnedMatrix< VALUE_FP > &psi_M, const int n, const int lda)
 reset internal work space for single walker case My understanding might be off. More...
 
template<typename TMAT >
void computeInvertAndLog (OffloadPinnedMatrix< TMAT > &a_mat, const int n, const int lda, LogValue &log_value)
 compute the inverse of invMat (in place) and the log value of determinant More...
 
template<typename TMAT >
void computeInvertAndLog (OffloadPinnedVector< TMAT > &psi_Ms, const int n, const int lda, OffloadPinnedVector< LogValue > &log_values)
 

Private Attributes

aligned_vector< VALUE_FP > m_work_
 
int lwork_
 
OffloadPinnedVector< VALUE_FP > psiM_fp_
 Matrices held in memory matrices n^2 * nw elements. More...
 
OffloadPinnedVector< VALUE_FP > LU_diags_fp_
 
OffloadPinnedVector< int > pivots_
 
OffloadPinnedVector< int > infos_
 
DiracMatrix< VALUE_FP > detEng_
 matrix inversion engine More...
 

Detailed Description

template<typename VALUE_FP>
class qmcplusplus::DiracMatrixComputeOMPTarget< VALUE_FP >

class to compute matrix inversion and the log value of determinant of a batch of DiracMatrixes.

Template Parameters
VALUE_FPthe datatype used in the actual computation of the matrix

There is one per crowd not one per MatrixUpdateEngine. this puts ownership of the scratch resources in a sensible place.

Currently this is CPU only but its external API is somewhat written to enforce the passing Dual data objects as arguments. Except for the single particle API log_value which is not Dual type but had better have an address in a OMPtarget mapped region if target is used with it. This makes this API incompatible to that used by MatrixDelayedUpdateCuda and DiracMatrixComputeCUDA.

Definition at line 45 of file DiracMatrixComputeOMPTarget.hpp.

Member Typedef Documentation

◆ FullPrecReal

using FullPrecReal = RealAlias<VALUE_FP>

Definition at line 48 of file DiracMatrixComputeOMPTarget.hpp.

◆ HandleResource

◆ LogValue

using LogValue = std::complex<FullPrecReal>

Definition at line 49 of file DiracMatrixComputeOMPTarget.hpp.

◆ OffloadPinnedAllocator

◆ OffloadPinnedMatrix

◆ OffloadPinnedVector

Constructor & Destructor Documentation

◆ DiracMatrixComputeOMPTarget()

Definition at line 163 of file DiracMatrixComputeOMPTarget.hpp.

163 : Resource("DiracMatrixComputeOMPTarget"), lwork_(0) {}
Resource(const std::string &name)
Definition: Resource.h:23

Member Function Documentation

◆ computeInvertAndLog() [1/2]

void computeInvertAndLog ( OffloadPinnedMatrix< TMAT > &  a_mat,
const int  n,
const int  lda,
LogValue log_value 
)
inlineprivate

compute the inverse of invMat (in place) and the log value of determinant

Template Parameters
TMATvalue type of matrix
Parameters
[in,out]a_matthe matrix
[in]nactual dimension of square matrix (no guarantee it really has full column rank)
[in]ldaleading dimension of Matrix container
[out]log_valuelog a_mat before inversion

Definition at line 121 of file DiracMatrixComputeOMPTarget.hpp.

References qmcplusplus::computeLogDet(), Matrix< T, Alloc >::data(), getNextLevelNumThreads(), qmcplusplus::lda, DiracMatrixComputeOMPTarget< VALUE_FP >::LU_diags_fp_, DiracMatrixComputeOMPTarget< VALUE_FP >::lwork_, DiracMatrixComputeOMPTarget< VALUE_FP >::m_work_, qmcplusplus::n, DiracMatrixComputeOMPTarget< VALUE_FP >::pivots_, DiracMatrixComputeOMPTarget< VALUE_FP >::reset(), qmcplusplus::Xgetrf(), and qmcplusplus::Xgetri().

Referenced by DiracMatrixComputeOMPTarget< VALUE_FP >::invert_transpose().

122  {
123  BlasThreadingEnv knob(getNextLevelNumThreads());
124  if (lwork_ < lda)
125  reset(a_mat, n, lda);
126  Xgetrf(n, n, a_mat.data(), lda, pivots_.data());
127  for (int i = 0; i < n; i++)
128  LU_diags_fp_[i] = a_mat.data()[i * lda + i];
129  log_value = {0.0, 0.0};
130  computeLogDet(LU_diags_fp_.data(), n, pivots_.data(), log_value);
131  Xgetri(n, a_mat.data(), lda, pivots_.data(), m_work_.data(), lwork_);
132  }
int Xgetrf(int n, int m, float *restrict a, int lda, int *restrict piv)
wrappers around xgetrf lapack routines
Definition: DiracMatrix.h:31
int Xgetri(int n, float *restrict a, int lda, int *restrict piv, float *restrict work, int &lwork)
inversion of a float matrix after lu factorization
Definition: DiracMatrix.h:60
void computeLogDet(const T *restrict diag, int n, const int *restrict pivot, std::complex< T_FP > &logdet)
Definition: DiracMatrix.h:100
void reset(OffloadPinnedVector< VALUE_FP > &psi_Ms, const int n, const int lda, const int batch_size)
reset internal work space.
int getNextLevelNumThreads()
get the number of threads at the next parallel level
Definition: OpenMP.h:35

◆ computeInvertAndLog() [2/2]

void computeInvertAndLog ( OffloadPinnedVector< TMAT > &  psi_Ms,
const int  n,
const int  lda,
OffloadPinnedVector< LogValue > &  log_values 
)
inlineprivate

Definition at line 135 of file DiracMatrixComputeOMPTarget.hpp.

References qmcplusplus::computeLogDet(), Vector< T, Alloc >::data(), getNextLevelNumThreads(), qmcplusplus::lda, qmcplusplus::log_values(), DiracMatrixComputeOMPTarget< VALUE_FP >::LU_diags_fp_, DiracMatrixComputeOMPTarget< VALUE_FP >::lwork_, DiracMatrixComputeOMPTarget< VALUE_FP >::m_work_, qmcplusplus::n, DiracMatrixComputeOMPTarget< VALUE_FP >::pivots_, DiracMatrixComputeOMPTarget< VALUE_FP >::reset(), qmcplusplus::Xgetrf(), and qmcplusplus::Xgetri().

139  {
140  const int nw = log_values.size();
141  BlasThreadingEnv knob(getNextLevelNumThreads());
142  if (lwork_ < lda)
143  reset(psi_Ms, n, lda, nw);
144  pivots_.resize(n * nw);
145  LU_diags_fp_.resize(n * nw);
146  for (int iw = 0; iw < nw; ++iw)
147  {
148  VALUE_FP* LU_M = psi_Ms.data() + iw * n * n;
149  Xgetrf(n, n, LU_M, lda, pivots_.data() + iw * n);
150  for (int i = 0; i < n; i++)
151  *(LU_diags_fp_.data() + iw * n + i) = LU_M[i * lda + i];
152  LogValue log_value{0.0, 0.0};
153  computeLogDet(LU_diags_fp_.data() + iw * n, n, pivots_.data() + iw * n, log_value);
154  log_values[iw] = log_value;
155  Xgetri(n, LU_M, lda, pivots_.data() + iw * n, m_work_.data(), lwork_);
156  }
157  }
std::vector< StdComp, CUDAHostAllocator< StdComp > > log_values(batch_size)
int Xgetrf(int n, int m, float *restrict a, int lda, int *restrict piv)
wrappers around xgetrf lapack routines
Definition: DiracMatrix.h:31
int Xgetri(int n, float *restrict a, int lda, int *restrict piv, float *restrict work, int &lwork)
inversion of a float matrix after lu factorization
Definition: DiracMatrix.h:60
void computeLogDet(const T *restrict diag, int n, const int *restrict pivot, std::complex< T_FP > &logdet)
Definition: DiracMatrix.h:100
void reset(OffloadPinnedVector< VALUE_FP > &psi_Ms, const int n, const int lda, const int batch_size)
reset internal work space.
int getNextLevelNumThreads()
get the number of threads at the next parallel level
Definition: OpenMP.h:35

◆ invert_transpose() [1/2]

std::enable_if_t<std::is_same<VALUE_FP, TMAT>::value> invert_transpose ( HandleResource resource,
const OffloadPinnedMatrix< TMAT > &  a_mat,
OffloadPinnedMatrix< TMAT > &  inv_a_mat,
LogValue log_value 
)
inline

compute the inverse of the transpose of matrix A and its determinant value in log when VALUE_FP and TMAT are the same

Template Parameters
TMATmatrix value type
TREALreal type
Parameters
[in]resourcecompute resource
[in]a_matmatrix to be inverted
[out]inv_a_matthe inverted matrix
[out]log_valuebreaks compatibility of MatrixUpdateOmpTarget with DiracMatrixComputeCUDA but is fine for OMPTarget

Definition at line 178 of file DiracMatrixComputeOMPTarget.hpp.

References Matrix< T, Alloc >::cols(), DiracMatrixComputeOMPTarget< VALUE_FP >::computeInvertAndLog(), Matrix< T, Alloc >::data(), qmcplusplus::lda, qmcplusplus::n, Matrix< T, Alloc >::rows(), and qmcplusplus::simd::transpose().

Referenced by qmcplusplus::TEST_CASE().

182  {
183  const int n = a_mat.rows();
184  const int lda = a_mat.cols();
185  const int ldb = inv_a_mat.cols();
186  simd::transpose(a_mat.data(), n, lda, inv_a_mat.data(), n, ldb);
187  // In this case we just pass the value since
188  // that makes sense for a single walker API
189  computeInvertAndLog(inv_a_mat, n, ldb, log_value);
190  }
void computeInvertAndLog(OffloadPinnedMatrix< TMAT > &a_mat, const int n, const int lda, LogValue &log_value)
compute the inverse of invMat (in place) and the log value of determinant
void transpose(const T *restrict A, size_t m, size_t lda, TO *restrict B, size_t n, size_t ldb)
transpose of A(m,n) to B(n,m)
Definition: algorithm.hpp:97

◆ invert_transpose() [2/2]

std::enable_if_t<!std::is_same<VALUE_FP, TMAT>::value> invert_transpose ( HandleResource resource,
const OffloadPinnedMatrix< TMAT > &  a_mat,
OffloadPinnedMatrix< TMAT > &  inv_a_mat,
LogValue log_value 
)
inline

compute the inverse of the transpose of matrix A and its determinant value in log when VALUE_FP and TMAT are the different

Template Parameters
TMATmatrix value type
TREALreal type

Definition at line 198 of file DiracMatrixComputeOMPTarget.hpp.

References Matrix< T, Alloc >::cols(), DiracMatrixComputeOMPTarget< VALUE_FP >::computeInvertAndLog(), Matrix< T, Alloc >::data(), qmcplusplus::lda, qmcplusplus::n, DiracMatrixComputeOMPTarget< VALUE_FP >::psiM_fp_, qmcplusplus::simd::remapCopy(), Matrix< T, Alloc >::rows(), and qmcplusplus::simd::transpose().

202  {
203  const int n = a_mat.rows();
204  const int lda = a_mat.cols();
205  const int ldb = inv_a_mat.cols();
206 
207  psiM_fp_.resize(n * lda);
208  simd::transpose(a_mat.data(), n, lda, psiM_fp_.data(), n, lda);
209  OffloadPinnedMatrix<VALUE_FP> psiM_fp_view(psiM_fp_, psiM_fp_.data(), n, lda);
210  computeInvertAndLog(psiM_fp_view, n, lda, log_value);
211 
212  //Matrix<TMAT> data_ref_matrix;
213  //maybe n, lda
214  //data_ref_matrix.attachReference(psiM_fp_.data(), n, n);
215  //Because inv_a_mat is "aligned" this is unlikely to work.
216  simd::remapCopy(n, n, psiM_fp_.data(), lda, inv_a_mat.data(), ldb);
217  }
void computeInvertAndLog(OffloadPinnedMatrix< TMAT > &a_mat, const int n, const int lda, LogValue &log_value)
compute the inverse of invMat (in place) and the log value of determinant
void transpose(const T *restrict A, size_t m, size_t lda, TO *restrict B, size_t n, size_t ldb)
transpose of A(m,n) to B(n,m)
Definition: algorithm.hpp:97
OffloadPinnedVector< VALUE_FP > psiM_fp_
Matrices held in memory matrices n^2 * nw elements.
void remapCopy(size_t m, size_t n, const T *restrict A, size_t lda, TO *restrict B, size_t ldb)
copy of A(m,n) to B(m,n)
Definition: algorithm.hpp:115

◆ makeClone()

std::unique_ptr<Resource> makeClone ( ) const
inlineoverridevirtual

Implements Resource.

Definition at line 165 of file DiracMatrixComputeOMPTarget.hpp.

165 { return std::make_unique<DiracMatrixComputeOMPTarget>(*this); }

◆ mw_invertTranspose()

void mw_invertTranspose ( compute::Queue< PL > &  resource_ignored,
const RefVector< const OffloadPinnedMatrix< TMAT >> &  a_mats,
const RefVector< OffloadPinnedMatrix< TMAT >> &  inv_a_mats,
OffloadPinnedVector< LogValue > &  log_values 
)
inline

This covers both mixed and Full precision case.

Todo:
measure if using the a_mats without a copy to contiguous vector is better.

Definition at line 224 of file DiracMatrixComputeOMPTarget.hpp.

References DiracMatrixComputeOMPTarget< VALUE_FP >::detEng_, DiracMatrix< T_FP >::invert_transpose(), and qmcplusplus::log_values().

Referenced by qmcplusplus::TEST_CASE().

228  {
229  for (int iw = 0; iw < a_mats.size(); iw++)
230  {
231  auto& Ainv = inv_a_mats[iw].get();
232  detEng_.invert_transpose(a_mats[iw].get(), Ainv, log_values[iw]);
233  Ainv.updateTo();
234  }
235 
236  /* FIXME
237  const int nw = a_mats.size();
238  const size_t n = a_mats[0].get().rows();
239  const size_t lda = a_mats[0].get().cols();
240  const size_t ldb = inv_a_mats[0].get().cols();
241 
242  size_t nsqr{n * n};
243  psiM_fp_.resize(n * lda * nw);
244  for (int iw = 0; iw < nw; ++iw)
245  simd::transpose(a_mats[iw].get().data(), n, lda, psiM_fp_.data() + nsqr * iw, n, lda);
246 
247  computeInvertAndLog(psiM_fp_, n, lda, log_values);
248  for (int iw = 0; iw < nw; ++iw)
249  {
250  simd::remapCopy(n, n, psiM_fp_.data() + nsqr * iw, lda, inv_a_mats[iw].get().data(), ldb);
251  }
252  */
253  }
std::vector< StdComp, CUDAHostAllocator< StdComp > > log_values(batch_size)
DiracMatrix< VALUE_FP > detEng_
matrix inversion engine
std::enable_if_t< std::is_same< T_FP, TMAT >::value > invert_transpose(const Matrix< TMAT, ALLOC1 > &amat, Matrix< TMAT, ALLOC2 > &invMat, std::complex< TREAL > &LogDet)
compute the inverse of the transpose of matrix A and its determinant value in log when T_FP and TMAT ...
Definition: DiracMatrix.h:188

◆ reset() [1/2]

void reset ( OffloadPinnedVector< VALUE_FP > &  psi_Ms,
const int  n,
const int  lda,
const int  batch_size 
)
inlineprivate

reset internal work space.

My understanding might be off.

it smells that this is so complex.

Definition at line 78 of file DiracMatrixComputeOMPTarget.hpp.

References qmcplusplus::batch_size, qmcplusplus::convert(), Vector< T, Alloc >::data(), qmcplusplus::lda, DiracMatrixComputeOMPTarget< VALUE_FP >::lwork_, DiracMatrixComputeOMPTarget< VALUE_FP >::m_work_, qmcplusplus::n, DiracMatrixComputeOMPTarget< VALUE_FP >::pivots_, and qmcplusplus::Xgetri().

Referenced by DiracMatrixComputeOMPTarget< VALUE_FP >::computeInvertAndLog().

79  {
80  const int nw = batch_size;
81  pivots_.resize(lda * nw);
82  for (int iw = 0; iw < nw; ++iw)
83  {
84  lwork_ = -1;
85  VALUE_FP tmp;
86  FullPrecReal lw;
87  auto psi_M_ptr = psi_Ms.data() + iw * n * n;
88  Xgetri(lda, psi_M_ptr, lda, pivots_.data() + iw * n, &tmp, lwork_);
89  convert(tmp, lw);
90  lwork_ = static_cast<int>(lw);
91  m_work_.resize(lwork_);
92  }
93  }
void convert(const PL &lat, const PV &pin, PV &pout)
int Xgetri(int n, float *restrict a, int lda, int *restrict piv, float *restrict work, int &lwork)
inversion of a float matrix after lu factorization
Definition: DiracMatrix.h:60

◆ reset() [2/2]

void reset ( OffloadPinnedMatrix< VALUE_FP > &  psi_M,
const int  n,
const int  lda 
)
inlineprivate

reset internal work space for single walker case My understanding might be off.

it smells that this is so complex.

Definition at line 100 of file DiracMatrixComputeOMPTarget.hpp.

References Matrix< T, Alloc >::data(), qmcplusplus::lda, DiracMatrixComputeOMPTarget< VALUE_FP >::LU_diags_fp_, DiracMatrixComputeOMPTarget< VALUE_FP >::lwork_, DiracMatrixComputeOMPTarget< VALUE_FP >::m_work_, DiracMatrixComputeOMPTarget< VALUE_FP >::pivots_, and qmcplusplus::Xgetri().

101  {
102  pivots_.resize(lda);
103  LU_diags_fp_.resize(lda);
104  lwork_ = -1;
105  VALUE_FP tmp;
106  FullPrecReal lw;
107  Xgetri(lda, psi_M.data(), lda, pivots_.data(), &tmp, lwork_);
108  lw = std::real(tmp);
109  lwork_ = static_cast<int>(lw);
110  m_work_.resize(lwork_);
111  }
QMCTraits::RealType real
int Xgetri(int n, float *restrict a, int lda, int *restrict piv, float *restrict work, int &lwork)
inversion of a float matrix after lu factorization
Definition: DiracMatrix.h:60

Member Data Documentation

◆ detEng_

DiracMatrix<VALUE_FP> detEng_
private

matrix inversion engine

Definition at line 160 of file DiracMatrixComputeOMPTarget.hpp.

Referenced by DiracMatrixComputeOMPTarget< VALUE_FP >::mw_invertTranspose().

◆ infos_

OffloadPinnedVector<int> infos_
private

Definition at line 71 of file DiracMatrixComputeOMPTarget.hpp.

◆ LU_diags_fp_

◆ lwork_

◆ m_work_

◆ pivots_

◆ psiM_fp_

OffloadPinnedVector<VALUE_FP> psiM_fp_
private

Matrices held in memory matrices n^2 * nw elements.

Definition at line 68 of file DiracMatrixComputeOMPTarget.hpp.

Referenced by DiracMatrixComputeOMPTarget< VALUE_FP >::invert_transpose().


The documentation for this class was generated from the following file: