class to compute matrix inversion and the log value of determinant of a batch of DiracMatrixes. More...

Inheritance diagram for DiracMatrixComputeOMPTarget< VALUE_FP >:

Collaboration diagram for DiracMatrixComputeOMPTarget< VALUE_FP >:

Public Types
using	FullPrecReal = RealAlias< VALUE_FP >

using	LogValue = std::complex< FullPrecReal >

template<typename T >
using	OffloadPinnedAllocator = OMPallocator< T, PinnedAlignedAllocator< T > >

template<typename T >
using	OffloadPinnedMatrix = Matrix< T, OffloadPinnedAllocator< T > >

template<typename T >
using	OffloadPinnedVector = Vector< T, OffloadPinnedAllocator< T > >

using	HandleResource = compute::Queue< PlatformKind::OMPTARGET >

Public Member Functions
	DiracMatrixComputeOMPTarget ()

std::unique_ptr< Resource >	makeClone () const override

template<typename TMAT >
std::enable_if_t< std::is_same< VALUE_FP, TMAT >::value >	invert_transpose (HandleResource &resource, const OffloadPinnedMatrix< TMAT > &a_mat, OffloadPinnedMatrix< TMAT > &inv_a_mat, LogValue &log_value)
	compute the inverse of the transpose of matrix A and its determinant value in log when VALUE_FP and TMAT are the same More...

template<typename TMAT >
std::enable_if_t<!std::is_same< VALUE_FP, TMAT >::value >	invert_transpose (HandleResource &resource, const OffloadPinnedMatrix< TMAT > &a_mat, OffloadPinnedMatrix< TMAT > &inv_a_mat, LogValue &log_value)
	compute the inverse of the transpose of matrix A and its determinant value in log when VALUE_FP and TMAT are the different More...

template<typename TMAT , PlatformKind PL>
void	mw_invertTranspose (compute::Queue< PL > &resource_ignored, const RefVector< const OffloadPinnedMatrix< TMAT >> &a_mats, const RefVector< OffloadPinnedMatrix< TMAT >> &inv_a_mats, OffloadPinnedVector< LogValue > &log_values)
	This covers both mixed and Full precision case. More...

Public Member Functions inherited from Resource
	Resource (const std::string &name)

virtual	~Resource ()=default

const std::string &	getName () const

Private Member Functions
void	reset (OffloadPinnedVector< VALUE_FP > &psi_Ms, const int n, const int lda, const int batch_size)
	reset internal work space. More...

void	reset (OffloadPinnedMatrix< VALUE_FP > &psi_M, const int n, const int lda)
	reset internal work space for single walker case My understanding might be off. More...

template<typename TMAT >
void	computeInvertAndLog (OffloadPinnedMatrix< TMAT > &a_mat, const int n, const int lda, LogValue &log_value)
	compute the inverse of invMat (in place) and the log value of determinant More...

template<typename TMAT >
void	computeInvertAndLog (OffloadPinnedVector< TMAT > &psi_Ms, const int n, const int lda, OffloadPinnedVector< LogValue > &log_values)

Private Attributes
aligned_vector< VALUE_FP >	m_work_

int	lwork_

OffloadPinnedVector< VALUE_FP >	psiM_fp_
	Matrices held in memory matrices n^2 * nw elements. More...

OffloadPinnedVector< VALUE_FP >	LU_diags_fp_

OffloadPinnedVector< int >	pivots_

OffloadPinnedVector< int >	infos_

DiracMatrix< VALUE_FP >	detEng_
	matrix inversion engine More...

Detailed Description

template<typename VALUE_FP>
class qmcplusplus::DiracMatrixComputeOMPTarget< VALUE_FP >

class to compute matrix inversion and the log value of determinant of a batch of DiracMatrixes.

Template Parameters

VALUE_FP the datatype used in the actual computation of the matrix

There is one per crowd not one per MatrixUpdateEngine. this puts ownership of the scratch resources in a sensible place.

Currently this is CPU only but its external API is somewhat written to enforce the passing Dual data objects as arguments. Except for the single particle API log_value which is not Dual type but had better have an address in a OMPtarget mapped region if target is used with it. This makes this API incompatible to that used by MatrixDelayedUpdateCuda and DiracMatrixComputeCUDA.

Definition at line 45 of file DiracMatrixComputeOMPTarget.hpp.

Member Typedef Documentation

◆ FullPrecReal

using FullPrecReal = RealAlias<VALUE_FP>

Definition at line 48 of file DiracMatrixComputeOMPTarget.hpp.

◆ HandleResource

using HandleResource = compute::Queue<PlatformKind::OMPTARGET>

Definition at line 61 of file DiracMatrixComputeOMPTarget.hpp.

◆ LogValue

using LogValue = std::complex<FullPrecReal>

Definition at line 49 of file DiracMatrixComputeOMPTarget.hpp.

◆ OffloadPinnedAllocator

using OffloadPinnedAllocator = OMPallocator<T, PinnedAlignedAllocator<T> >

Definition at line 54 of file DiracMatrixComputeOMPTarget.hpp.

◆ OffloadPinnedMatrix

using OffloadPinnedMatrix = Matrix<T, OffloadPinnedAllocator<T> >

Definition at line 56 of file DiracMatrixComputeOMPTarget.hpp.

◆ OffloadPinnedVector

using OffloadPinnedVector = Vector<T, OffloadPinnedAllocator<T> >

Definition at line 58 of file DiracMatrixComputeOMPTarget.hpp.

Constructor & Destructor Documentation

◆ DiracMatrixComputeOMPTarget()

DiracMatrixComputeOMPTarget ( )

inline

Definition at line 163 of file DiracMatrixComputeOMPTarget.hpp.

163 : Resource("DiracMatrixComputeOMPTarget"), lwork_(0) {}

qmcplusplus::Resource::Resource

Resource(const std::string &name)

Definition: Resource.h:23

qmcplusplus::DiracMatrixComputeOMPTarget::lwork_

int lwork_

Definition: DiracMatrixComputeOMPTarget.hpp:65

Member Function Documentation

◆ computeInvertAndLog() [1/2]

void computeInvertAndLog	(	OffloadPinnedMatrix< TMAT > &	a_mat,
		const int	n,
		const int	lda,
		LogValue &	log_value
	)

inlineprivate

compute the inverse of invMat (in place) and the log value of determinant

Template Parameters

TMAT	value type of matrix

Parameters

[in,out]	a_mat	the matrix
[in]	n	actual dimension of square matrix (no guarantee it really has full column rank)
[in]	lda	leading dimension of Matrix container
[out]	log_value	log a_mat before inversion

Definition at line 121 of file DiracMatrixComputeOMPTarget.hpp.

References qmcplusplus::computeLogDet(), Matrix< T, Alloc >::data(), getNextLevelNumThreads(), qmcplusplus::lda, DiracMatrixComputeOMPTarget< VALUE_FP >::LU_diags_fp_, DiracMatrixComputeOMPTarget< VALUE_FP >::lwork_, DiracMatrixComputeOMPTarget< VALUE_FP >::m_work_, qmcplusplus::n, DiracMatrixComputeOMPTarget< VALUE_FP >::pivots_, DiracMatrixComputeOMPTarget< VALUE_FP >::reset(), qmcplusplus::Xgetrf(), and qmcplusplus::Xgetri().

Referenced by DiracMatrixComputeOMPTarget< VALUE_FP >::invert_transpose().

   {
     BlasThreadingEnv knob(getNextLevelNumThreads());
     if (lwork_ < lda)
       reset(a_mat, n, lda);
     Xgetrf(n, n, a_mat.data(), lda, pivots_.data());
     for (int i = 0; i < n; i++)
       LU_diags_fp_[i] = a_mat.data()[i * lda + i];
     log_value = {0.0, 0.0};
     computeLogDet(LU_diags_fp_.data(), n, pivots_.data(), log_value);
     Xgetri(n, a_mat.data(), lda, pivots_.data(), m_work_.data(), lwork_);
   }

◆ computeInvertAndLog() [2/2]

void computeInvertAndLog	(	OffloadPinnedVector< TMAT > &	psi_Ms,
		const int	n,
		const int	lda,
		OffloadPinnedVector< LogValue > &	log_values
	)

inlineprivate

Definition at line 135 of file DiracMatrixComputeOMPTarget.hpp.

References qmcplusplus::computeLogDet(), Vector< T, Alloc >::data(), getNextLevelNumThreads(), qmcplusplus::lda, qmcplusplus::log_values(), DiracMatrixComputeOMPTarget< VALUE_FP >::LU_diags_fp_, DiracMatrixComputeOMPTarget< VALUE_FP >::lwork_, DiracMatrixComputeOMPTarget< VALUE_FP >::m_work_, qmcplusplus::n, DiracMatrixComputeOMPTarget< VALUE_FP >::pivots_, DiracMatrixComputeOMPTarget< VALUE_FP >::reset(), qmcplusplus::Xgetrf(), and qmcplusplus::Xgetri().

   {
     const int nw = log_values.size();
     BlasThreadingEnv knob(getNextLevelNumThreads());
     if (lwork_ < lda)
       reset(psi_Ms, n, lda, nw);
     pivots_.resize(n * nw);
     LU_diags_fp_.resize(n * nw);
     for (int iw = 0; iw < nw; ++iw)
     {
       VALUE_FP* LU_M = psi_Ms.data() + iw * n * n;
       Xgetrf(n, n, LU_M, lda, pivots_.data() + iw * n);
       for (int i = 0; i < n; i++)
         *(LU_diags_fp_.data() + iw * n + i) = LU_M[i * lda + i];
       LogValue log_value{0.0, 0.0};
       computeLogDet(LU_diags_fp_.data() + iw * n, n, pivots_.data() + iw * n, log_value);
       log_values[iw] = log_value;
       Xgetri(n, LU_M, lda, pivots_.data() + iw * n, m_work_.data(), lwork_);
     }
   }

◆ invert_transpose() [1/2]

std::enable_if_t<std::is_same<VALUE_FP, TMAT>::value> invert_transpose	(	HandleResource &	resource,
		const OffloadPinnedMatrix< TMAT > &	a_mat,
		OffloadPinnedMatrix< TMAT > &	inv_a_mat,
		LogValue &	log_value
	)

inline

compute the inverse of the transpose of matrix A and its determinant value in log when VALUE_FP and TMAT are the same

Template Parameters

TMAT	matrix value type
TREAL	real type

Parameters

[in]	resource	compute resource
[in]	a_mat	matrix to be inverted
[out]	inv_a_mat	the inverted matrix
[out]	log_value	breaks compatibility of MatrixUpdateOmpTarget with DiracMatrixComputeCUDA but is fine for OMPTarget

Definition at line 178 of file DiracMatrixComputeOMPTarget.hpp.

References Matrix< T, Alloc >::cols(), DiracMatrixComputeOMPTarget< VALUE_FP >::computeInvertAndLog(), Matrix< T, Alloc >::data(), qmcplusplus::lda, qmcplusplus::n, Matrix< T, Alloc >::rows(), and qmcplusplus::simd::transpose().

Referenced by qmcplusplus::TEST_CASE().

   {
     const int n   = a_mat.rows();
     const int lda = a_mat.cols();
     const int ldb = inv_a_mat.cols();
     simd::transpose(a_mat.data(), n, lda, inv_a_mat.data(), n, ldb);
     // In this case we just pass the value since
     // that makes sense for a single walker API
     computeInvertAndLog(inv_a_mat, n, ldb, log_value);
   }

◆ invert_transpose() [2/2]

std::enable_if_t<!std::is_same<VALUE_FP, TMAT>::value> invert_transpose	(	HandleResource &	resource,
		const OffloadPinnedMatrix< TMAT > &	a_mat,
		OffloadPinnedMatrix< TMAT > &	inv_a_mat,
		LogValue &	log_value
	)

inline

compute the inverse of the transpose of matrix A and its determinant value in log when VALUE_FP and TMAT are the different

Template Parameters

TMAT	matrix value type
TREAL	real type

Definition at line 198 of file DiracMatrixComputeOMPTarget.hpp.

References Matrix< T, Alloc >::cols(), DiracMatrixComputeOMPTarget< VALUE_FP >::computeInvertAndLog(), Matrix< T, Alloc >::data(), qmcplusplus::lda, qmcplusplus::n, DiracMatrixComputeOMPTarget< VALUE_FP >::psiM_fp_, qmcplusplus::simd::remapCopy(), Matrix< T, Alloc >::rows(), and qmcplusplus::simd::transpose().

   {
     const int n   = a_mat.rows();
     const int lda = a_mat.cols();
     const int ldb = inv_a_mat.cols();
 
     psiM_fp_.resize(n * lda);
     simd::transpose(a_mat.data(), n, lda, psiM_fp_.data(), n, lda);
     OffloadPinnedMatrix<VALUE_FP> psiM_fp_view(psiM_fp_, psiM_fp_.data(), n, lda);
     computeInvertAndLog(psiM_fp_view, n, lda, log_value);
 
     //Matrix<TMAT> data_ref_matrix;
     //maybe n, lda
     //data_ref_matrix.attachReference(psiM_fp_.data(), n, n);
     //Because inv_a_mat is "aligned" this is unlikely to work.
     simd::remapCopy(n, n, psiM_fp_.data(), lda, inv_a_mat.data(), ldb);
   }

◆ makeClone()

std::unique_ptr<Resource> makeClone ( ) const

inlineoverridevirtual

Implements Resource.

Definition at line 165 of file DiracMatrixComputeOMPTarget.hpp.

165 { return std::make_unique<DiracMatrixComputeOMPTarget>(*this); }

◆ mw_invertTranspose()

void mw_invertTranspose	(	compute::Queue< PL > &	resource_ignored,
		const RefVector< const OffloadPinnedMatrix< TMAT >> &	a_mats,
		const RefVector< OffloadPinnedMatrix< TMAT >> &	inv_a_mats,
		OffloadPinnedVector< LogValue > &	log_values
	)

inline

This covers both mixed and Full precision case.

Todo:: measure if using the a_mats without a copy to contiguous vector is better.

Definition at line 224 of file DiracMatrixComputeOMPTarget.hpp.

References DiracMatrixComputeOMPTarget< VALUE_FP >::detEng_, DiracMatrix< T_FP >::invert_transpose(), and qmcplusplus::log_values().

Referenced by qmcplusplus::TEST_CASE().

   {
     for (int iw = 0; iw < a_mats.size(); iw++)
     {
       auto& Ainv = inv_a_mats[iw].get();
       detEng_.invert_transpose(a_mats[iw].get(), Ainv, log_values[iw]);
       Ainv.updateTo();
     }
 
     /* FIXME
     const int nw     = a_mats.size();
     const size_t n   = a_mats[0].get().rows();
     const size_t lda = a_mats[0].get().cols();
     const size_t ldb = inv_a_mats[0].get().cols();
 
     size_t nsqr{n * n};
     psiM_fp_.resize(n * lda * nw);
     for (int iw = 0; iw < nw; ++iw)
       simd::transpose(a_mats[iw].get().data(), n, lda, psiM_fp_.data() + nsqr * iw, n, lda);
 
     computeInvertAndLog(psiM_fp_, n, lda, log_values);
     for (int iw = 0; iw < nw; ++iw)
     {
       simd::remapCopy(n, n, psiM_fp_.data() + nsqr * iw, lda, inv_a_mats[iw].get().data(), ldb);
     }
     */
   }

◆ reset() [1/2]

void reset	(	OffloadPinnedVector< VALUE_FP > &	psi_Ms,
		const int	n,
		const int	lda,
		const int	batch_size
	)

inlineprivate

reset internal work space.

My understanding might be off.

it smells that this is so complex.

Definition at line 78 of file DiracMatrixComputeOMPTarget.hpp.

References qmcplusplus::batch_size, qmcplusplus::convert(), Vector< T, Alloc >::data(), qmcplusplus::lda, DiracMatrixComputeOMPTarget< VALUE_FP >::lwork_, DiracMatrixComputeOMPTarget< VALUE_FP >::m_work_, qmcplusplus::n, DiracMatrixComputeOMPTarget< VALUE_FP >::pivots_, and qmcplusplus::Xgetri().

Referenced by DiracMatrixComputeOMPTarget< VALUE_FP >::computeInvertAndLog().

   {
     const int nw = batch_size;
     pivots_.resize(lda * nw);
     for (int iw = 0; iw < nw; ++iw)
     {
       lwork_ = -1;
       VALUE_FP tmp;
       FullPrecReal lw;
       auto psi_M_ptr = psi_Ms.data() + iw * n * n;
       Xgetri(lda, psi_M_ptr, lda, pivots_.data() + iw * n, &tmp, lwork_);
       convert(tmp, lw);
       lwork_ = static_cast<int>(lw);
       m_work_.resize(lwork_);
     }
   }

◆ reset() [2/2]

void reset	(	OffloadPinnedMatrix< VALUE_FP > &	psi_M,
		const int	n,
		const int	lda
	)

inlineprivate

reset internal work space for single walker case My understanding might be off.

it smells that this is so complex.

Definition at line 100 of file DiracMatrixComputeOMPTarget.hpp.

References Matrix< T, Alloc >::data(), qmcplusplus::lda, DiracMatrixComputeOMPTarget< VALUE_FP >::LU_diags_fp_, DiracMatrixComputeOMPTarget< VALUE_FP >::lwork_, DiracMatrixComputeOMPTarget< VALUE_FP >::m_work_, DiracMatrixComputeOMPTarget< VALUE_FP >::pivots_, and qmcplusplus::Xgetri().

   {
     pivots_.resize(lda);
     LU_diags_fp_.resize(lda);
     lwork_ = -1;
     VALUE_FP tmp;
     FullPrecReal lw;
     Xgetri(lda, psi_M.data(), lda, pivots_.data(), &tmp, lwork_);
     lw     = std::real(tmp);
     lwork_ = static_cast<int>(lw);
     m_work_.resize(lwork_);
   }