BLAS

The HighFM library was designed around Expression Templates [1] [2] in order to allow users to write mathematical expressions as close to the textbooks as possible. These expressions are then lazily evaluated to achieve high performance. In other words, some operators (e.g., +) and methods (e.g., mat.T()) return an abstract object representing the operation without performing any actual computation. The computation only happens later, when the entire expression is evaluated, usually in the assignment operator =. Although this sounds inefficient, modern compilers can optimize all the abstraction away and generate a perfectly optimized code. For example, the code

Vector<float> v1(n);
Vector<float> v2(n);
Vector<float> v3(n);
// ...
v3 = 5 * v1 + v2;

will be transformed to a single for-loop, so the vectors are traversed only once and no temporary is created. Without any further optimizations (e.g., SIMD), the loop looks like this:

for (int i = 0; i < n; ++i) {
   v3[i] = 5 * v1[i] + v2[i];
}

Whenever possible, HighFM will transform the expression into a call to the corresponding BLAS routine. For instance,

Matrix<float> A(n);
Vector<float> v1(n);
Vector<float> v2(n);
// ...
v2 = A * v1 + 0.5 * v2;

will be translated to a call the sgemv routine. In this case, the HighFM library acts as the translation layer between the C++ objects and the backend library.

However, there are some constraints on how the expression can be written. This is a consequence of the strong connection with BLAS/LAPACK as well as to strike a balance between compilation times and coding flexibility. The restrictions include

  • All matrices and vectors must be defined in terms of single or double-precision floating-point numbers. They can be either real (float or double) or complex (std::complex<float> or std::complex<double>).

  • Within an expression, all operands must share the same data type and reside in the same memory space.

  • A Vector sum can only contain a maximum of 8 operands. Likewise, the maximum number of general, dense Matrix objects that can be added or transposed together is 4 matrices. For all the other objects (e.g., sparse, diagonal and triangular matrices), the maximum is 2.

  • HighFM follows the rules of Linear Algebra. For instance, Mat * vec means a matrix-vector multiplication, while Mat + vec is an invalid operation.

Scaling

Matrices and vectors can be scaled as

Matrix<float> A(m, n);
Matrix<float> B(m, n);
float alpha = 1.5;
float beta = 0.5;

// ... //

A = A * alpha;
B = beta * A;
B = B / (alpha * beta);

Addition and Subtraction

Use the + and - operators to add and subtract matrices and vectors:

Matrix<float> A1(m, n);
Matrix<float> A2(m, n);
Matrix<float> A3(m, n);
Matrix<float> B(m, n);

// ... //

B = A1 + 0.3 * A2 - A3;

Warning

The number of rows and columns of the left-hand and right-hand sides must match. Likewise, all matrices and vectors in the expression must have the same dimensions.

Transpose and Conjugate

The transpose AT, the adjoint (or conjugate transpose) AH and complex conjugate A of a matrix A can be calculated by calling the methods A.T(), A.H() and A.conj(), respectively. For real matrices, A.H() == A.T() and A.conj() is a no-op. You can also combine the transpose/adjoint/conjugate with the addition:

Matrix<std::complex<float>> A(10, 5);
Matrix<std::complex<float>> B(5, 10);
Matrix<std::complex<float>> C(5, 10);
B = A.T() + C;
B = A.H() + C.conj();

Similar to the other operations, the T() will create an abstract object without doing the actual transposition. For example, in the B = A.T() expression, the transpose only will be evaluated when assigning the results to B. This creates a problem when trying to assign the transpose to itself: A = A.T(). HighFM will start to write the results to A before transposing the matrix completely. Therefore, the A = A.T() does not replace the A with its transpose A.T as expected. We denote this problem as Aliasing.

For in-place transposition (i.e., A = A.T()) use the transpose_inplace() method instead. Or equivalently, adjoint_inplace().

Entry-wise Operations

You can also apply an arbitrary function to each entry of the matrix or vector using the for_each routine:

Vector<float> x(m);
Vector<float> y(m);
// ... //

// Calculates the logarithm entrywise
y = for_each(x, [](auto el){ return std::log(el); });

Note

This is only valid for matrices and vectors located in the host.

Multiplication

The product between two matrices or between a matrix and a vector can be expressed using the operator *. Be careful with Aliasing: A=AB is not safe!

Matrix<double> dA(n, n), dB(n, n), dC(n, n), dR(n, n);
Vector<double> dv1(n), dv2(n), dv3(n);
CSRMatrix<double> sA(n, n);
double alpha, beta;

// ... //

// Dense matrix x dense matrix
dR = alpha * dA * dB + beta * dC;
dR = alpha * dA.H() * dB + beta * dC;
dR = alpha *  dA * dB.H() + beta * dC;
dR = alpha *  dA.H() * dB.H() + beta * dC;

// Dense matrix x dense vector
dv3 = alpha *  dA * v1 + beta * v2;
dv3 = alpha *  dA.H() * v1 + beta * v2;

Note

You can omit the matrix dC or the vector v2 if beta == 0. Moreover, these expressions can be evaluated in place by setting dC and dv2 instead of dR and dv3. Both the transpose T() and the adjoint H() are supported, independent of the data type.

Note

The matrices and vectors must follow the matrix dimensions rules as defined in Linear Algebra.

Warning

Chaining products (dA * dB * dC), having more than 1 product in the same expression (dA * dB + dA * dC), using expressions as arguments ((dA + dB) * dC) or extending the expression to 4 or more operands (dA + dB + dA * dC) are not allowed. If any of these operations are required, please separate the expression into multiple operations.

Norm

HighFM can compute the following vector norms:

  • norm1(x) = x1=|x1|+|x2|++|xn| (1 norm)

  • norm2(x) = x2=|x1|2+|x2|2++|xn|2 (2 norm)

  • norm_inf(x) = x=x=max(|x1|,|x2|,,|xn|) ( norm)

  • amax(x) = x ( norm)

In the case of the matrices, they have a different meaning:

  • norm1(A) = A1=maxji|aij| (1 norm, or maximum column sum)

  • norm2(A) = AF=ij|aij|2 (Frobenius norm)

  • norm_inf(A) = A=maxij|aij| ( norm, or maximum row sum)

  • amax(A) = maxi,j|aij| (maximum absolute value)

Note

Even if the matrix or vector is complex, the norm always returns a real number.

Dot

The inner product of two vectors can be calculated with the dot() method:

Vector<double> v1, v2;
double result = dot(v1, v2);

Diagonal and Identity Matrix

Identity matrices can be created using the Eye<T> object, where T is the desired data type (e.g., float or double). Diagonal matrices, on the other hand, are stored as a Vector and require the use of diag() to be interpreted correctly. Both can be assigned to dense Matrix or be directly integrated into the matrix sum:

Matrix<float> A(n, n);
Matrix<float> R(n, n);
Vector<float> D(n);         // The diagonal matrix is stored as a vector.

R = Eye<float>();
R = diag(D);

R = A + 0.5 * Eye<float>();
R = A + 0.5 * diag(D);

You can also multiply regular and diagonal matrices:

R = 0.5 * A * diag(D);
R = 0.5 * diag(D) * A;

Note

The expression can only contain a single diagonal or identity matrix.

Triangular Matrix

The Triangular Map represents the upper or lower triangular part of a dense Matrix, i.e., a map to only the elements at or above/below the main diagonal. This is referred to in Linear Algebra as a Triangular Matrix (if the matrix is square) or a Trapezoidal Matrix (if the matrix is non-square). Generally, they are returned as the result of some matrix factorization, such QR, LQ or LU, but they can also be created explicitly by calling A.upper_triangular() or A.lower_triangular(), where A is a dense, regular Matrix. The other part will not be referenced.

Matrix<float> A(n, m);
Matrix<float> B(m, m);
Matrix<float> C(n, m);
Matrix<float> R(n, m);

// Note that only the upper triangular of A will be copied to R,
// while all the elements below the diagonal will be left untouched.
R = A.upper_triangular();

R = A.lower_triangular() + C;
R = A.upper_triangular() * B + C;

Note

Transpose (A.upper_triangular().T()), adjoint (A.upper_triangular().H()) and complex conjugate (A.upper_triangular().conj()) are also supported. However, in the matrix product, the regular matrix B must not contain any of these operations.

Note

The expression can only contain a single triangular matrix.

Slicing

Any BLAS/LAPACK operation can be called with vector and matrix slices instead of the full vector/matrix:

Matrix<float> A(n, n);
Matrix<float> B(n, n);
Vector<float> v1(n);

v1 = A * B(kAll, 2) + B(kAll, 1);
float alpha = dot(B(kAll, 2), B(kAll, 1));

Warning

If the elements in the left-hand and right-hand side overlap (e.g., A({1, k + 1}, {1, k + 1}) = 5 * A({0, k}, {0, k}) ), the result will be incorrect due to Aliasing.

Sparse BLAS

Due to the special layout of the CSR Matrix, only the following operations are supported:

  • Scaling:

    csr1 = alpha * csr1;
    csr1 = alpha * csr2;
    
  • Addition:

    csr1 = alpha * csr2 + beta * csr3;
    
  • Matrix product:

    Matrix<double> dB(n, n), dC(n, n), dR(n, n);
    Vector<double> dv1(n), dv2(n), dv3(n);
    CSRMatrix<double> csr(n, n);
    double alpha, beta;
    
    // ... //
    
    // Sparse matrix x dense vector
    dv3 = alpha *  csr * v1 + beta * v2;
    dv3 = alpha *  csr.T() * v1 + beta * v2;
    dv3 = alpha *  csr.H() * v1 + beta * v2;
    
    // Sparse matrix x dense matrix
    dR = alpha *  csr * dB + beta * dC;
    dR = alpha *  csr.T() * dB + beta * dC;
    dR = alpha *  csr.H() * dB + beta * dC;
    
  • Norm:

    double amax_v = amax(csr);
    double norm_frob_v = norm2(csr);
    // double norm_l1_v = norm_l1(csr);         // The L1 norm is not supported for CSR matrices
    double norm_inf_v = norm_inf(csr);
    

Warning

There is currently no BLAS operation with Sparse Vector.

Aliasing

In HighFM, we say that an expression has aliasing when a matrix or vector appears on both sides of the assignment operation, for instance, mat1 = mat1 + mat2 or mat1 = mat1.T(). Although in the first case, the aliasing is harmless, the second one will produce incorrect results.

The Aliasing problem arises due to the lazy evaluation, i.e., the expression is only evaluated during the assignment. In this case, HighFM will transpose a block of A and then immediately write the results to A, so when it moves to the next block, it will use the new values of A instead of the old ones. For example, HighFM will first compute A(0, 1) = A(1, 0) and then A(1, 0) = A(0, 1), losing the original value of A(1, 0).

In general, the aliasing can only be detected during the runtime. In the default mode (“debug”), HighFM will analyze the expression during the runtime and determine if it has harmful aliasing or not. For instance, when trying to evaluate mat1 = mat1.T(), the library will throw the following exception:

terminate called after throwing an instance of 'HighFM::HarmfulAliasing'
  what():  void HighFM::internal::ConjugateTransposeOp<Expr, Flag>::eval_to(HighFM::MatrixBase<E>&) [with E = HighFM::Matrix<float>; Expr = HighFM::Matrix<float>; HighFM::internal::TransposeFlag Flag = HighFM::internal::kTranspose] at
/home/nicolas/Documents/IST/PhD/Code/HighFM/include/highfm/blas/transpose.hpp:120:25
 HarmfulAliasing: Aliasing detected! You cannot assign the matrix transpose to itself (A = A.T)!

Besides built-in methods, such as transpose_inplace(), the only way to solve the aliasing issue is to evaluate the expression to a temporary and then move the results from the temporary to the desired object.

Expression Validation

HighFM will check if the created expression is valid or not. When possible, it will check during the compile time, producing compilation errors if it encounters any anomaly. These errors can be ugly and long, but they help to detect issues with your code. For example, mixing different types in the expression

Matrix<float> A(3, 3, 1);
Matrix<double> B(3, 3, 2);
Matrix<double> C(3, 3);

C = A + B;

will result in the following compilation error:

/home/nicolas/Documents/IST/PhD/Code/HighFM/include/highfm/blas/addition.hpp:62:35: error: static assertion failed: /home/nicolas/Documents/IST/PhD/Code/HighFM/include/highfm/blas/addition.hpp:62:
TypeError: All objects in the expression must have the same data type!

   62 |                     HIGHFM_ASSERT(have_same_value_type, TypeError,
      |                                   ^~~~~~~~~~~~~~~~~~~~

However, not all validation checks can be done during compile time. For instance, checking if the dimensions of the operands in the expression are correct. In these cases, HighFM will validate the expression during the runtime and throw an exception if detects any illegal operation:

Matrix<float> A(4, 4, 1);
Matrix<double> B(3, 3, 2);
Matrix<double> C(4, 4);

C = A + B;

Output:

terminate called after throwing an instance of 'HighFM::LengthError'
  what():  HighFM::internal::AdditionOp<Operands>::AdditionOp(HighFM::internal::ScalingOp<Operands>...) [with Operands = {HighFM::Matrix<float, HighFM::Host, HighFM::Array>, HighFM::Matrix<float, HighFM::Host, HighFM::Array>}] at
/home/nicolas/Documents/IST/PhD/Code/HighFM/include/highfm/blas/addition.hpp:56:21
 LengthError: The dimensions of the operands do not match!

References