# BLAS¶

The HighFM library was designed around Expression Templates [1] [2] in order to allow users to write mathematical expressions as close to the textbooks as possible. These expressions are then lazily evaluated to achieve high performance. In other words, some operators (e.g., `+`

) and methods (e.g., `mat.T()`

) return an abstract object representing the operation without performing any actual computation. The computation only happens later, when the entire expression is evaluated, usually in the assignment operator `=`

. Although this sounds inefficient, modern compilers can optimize all the abstraction away and generate a perfectly optimized code. For example, the code

```
Vector<float> v1(n);
Vector<float> v2(n);
Vector<float> v3(n);
// ...
v3 = 5 * v1 + v2;
```

will be transformed to a single for-loop, so the vectors are traversed only once and no temporary is created. Without any further optimizations (e.g., SIMD), the loop looks like this:

```
for (int i = 0; i < n; ++i) {
v3[i] = 5 * v1[i] + v2[i];
}
```

Whenever possible, HighFM will transform the expression into a call to the corresponding BLAS routine. For instance,

```
Matrix<float> A(n);
Vector<float> v1(n);
Vector<float> v2(n);
// ...
v2 = A * v1 + 0.5 * v2;
```

will be translated to a call the `sgemv`

routine. In this case, the HighFM library acts as the translation layer between the C++ objects and the backend library.

However, there are some constraints on how the expression can be written. This is a consequence of the strong connection with BLAS/LAPACK as well as to strike a balance between compilation times and coding flexibility. The restrictions include

All matrices and vectors

**must**be defined in terms of single or double-precision floating-point numbers. They can be either real (`float`

or`double`

) or complex (`std::complex<float>`

or`std::complex<double>`

).Within an expression, all operands

**must**share the same data type and reside in the same memory space.A Vector sum can only contain a maximum of

`8`

operands. Likewise, the maximum number of general, dense Matrix objects that can be added or transposed together is`4`

matrices. For all the other objects (e.g., sparse, diagonal and triangular matrices), the maximum is`2`

.HighFM follows the rules of Linear Algebra. For instance,

`Mat * vec`

means a matrix-vector multiplication, while`Mat + vec`

is an invalid operation.

## Scaling¶

Matrices and vectors can be scaled as

```
Matrix<float> A(m, n);
Matrix<float> B(m, n);
float alpha = 1.5;
float beta = 0.5;
// ... //
A = A * alpha;
B = beta * A;
B = B / (alpha * beta);
```

## Addition and Subtraction¶

Use the + and - operators to add and subtract matrices and vectors:

```
Matrix<float> A1(m, n);
Matrix<float> A2(m, n);
Matrix<float> A3(m, n);
Matrix<float> B(m, n);
// ... //
B = A1 + 0.3 * A2 - A3;
```

Warning

The number of rows and columns of the left-hand and right-hand sides **must** match. Likewise, all matrices and vectors in the expression must have the same dimensions.

## Transpose and Conjugate¶

The transpose \(\mathbf{A}^T\), the adjoint (or conjugate transpose) \(\mathbf{A}^H\) and complex conjugate \(\overline{\mathbf{A}}\) of a matrix \(\mathbf{A}\) can be calculated by calling the methods `A.T()`

, `A.H()`

and `A.conj()`

, respectively. For real matrices, `A.H() == A.T()`

and `A.conj()`

is a no-op. You can also combine the transpose/adjoint/conjugate with the addition:

```
Matrix<std::complex<float>> A(10, 5);
Matrix<std::complex<float>> B(5, 10);
Matrix<std::complex<float>> C(5, 10);
B = A.T() + C;
B = A.H() + C.conj();
```

Similar to the other operations, the `T()`

will create an abstract object without doing the actual transposition. For example, in the `B = A.T()`

expression, the transpose only will be evaluated when assigning the results to `B`

. This creates a problem when trying to assign the transpose to itself: `A = A.T()`

. HighFM will start to write the results to `A`

before transposing the matrix completely. Therefore, the `A = A.T()`

does not replace the `A`

with its transpose `A.T`

as expected. We denote this problem as Aliasing.

For *in-place* transposition (i.e., `A = A.T()`

) use the `transpose_inplace()`

method instead. Or equivalently, `adjoint_inplace()`

.

## Entry-wise Operations¶

You can also apply an arbitrary function to each entry of the matrix or vector using the `for_each`

routine:

```
Vector<float> x(m);
Vector<float> y(m);
// ... //
// Calculates the logarithm entrywise
y = for_each(x, [](auto el){ return std::log(el); });
```

Note

This is only valid for matrices and vectors located in the host.

## Multiplication¶

The product between two matrices or between a matrix and a vector can be expressed using the operator `*`

. *Be careful* with Aliasing: \(\mathbf{A} = \mathbf{A} \mathbf{B}\) is not safe!

```
Matrix<double> dA(n, n), dB(n, n), dC(n, n), dR(n, n);
Vector<double> dv1(n), dv2(n), dv3(n);
CSRMatrix<double> sA(n, n);
double alpha, beta;
// ... //
// Dense matrix x dense matrix
dR = alpha * dA * dB + beta * dC;
dR = alpha * dA.H() * dB + beta * dC;
dR = alpha * dA * dB.H() + beta * dC;
dR = alpha * dA.H() * dB.H() + beta * dC;
// Dense matrix x dense vector
dv3 = alpha * dA * v1 + beta * v2;
dv3 = alpha * dA.H() * v1 + beta * v2;
```

Note

You can omit the matrix `dC`

or the vector `v2`

if `beta == 0`

. Moreover, these expressions can be evaluated in place by setting `dC`

and `dv2`

instead of `dR`

and `dv3`

. Both the transpose `T()`

and the adjoint `H()`

are supported, independent of the data type.

Note

The matrices and vectors must follow the matrix dimensions rules as defined in Linear Algebra.

Warning

Chaining products (`dA * dB * dC`

), having more than 1 product in the same expression (`dA * dB + dA * dC`

), using expressions as arguments (`(dA + dB) * dC`

) or extending the expression to 4 or more operands (`dA + dB + dA * dC`

) are **not** allowed. If any of these operations are required, please separate the expression into multiple operations.

## Norm¶

HighFM can compute the following vector norms:

`norm1(x)`

= \(\|\vec{x}\|_1 = |x_1| + |x_2| + \dots + |x_n|\) (\(\ell_1\) norm)`norm2(x)`

= \(\|\vec{x}\|_2 = \sqrt{|x_1|^2 + |x_2|^2 + \dots + |x_n|^2}\) (\(\ell_2\) norm)`norm_inf(x)`

= \(\|\vec{x}\|_\infty = \|\vec{x}\| = max(|x_1|, |x_2|, \dots, |x_n|)\) (\(\ell_\infty\) norm)`amax(x)`

= \(\|\vec{x}\|_\infty\) (\(\ell_\infty\) norm)

In the case of the matrices, they have a different meaning:

`norm1(A)`

= \(\|\mathbf{A}\|_1 = \max_j \; \sum_i{|a_{ij}|}\) (\(\ell_1\) norm, or maximum column sum)`norm2(A)`

= \(\|\mathbf{A}\|_F = \sum_i{\sum_j{ |a_{ij}|^2}}\) (*Frobenius*norm)`norm_inf(A)`

= \(\|\mathbf{A}\|_\infty = \max_i \; \sum_j{|a_{ij}|}\) (\(\ell_\infty\) norm, or maximum row sum)`amax(A)`

= \(\max_{i,j} {|a_{ij}|}\) (maximum absolute value)

Note

Even if the matrix or vector is complex, the norm always returns a real number.

## Dot¶

The inner product of two vectors can be calculated with the `dot()`

method:

```
Vector<double> v1, v2;
double result = dot(v1, v2);
```

## Diagonal and Identity Matrix¶

Identity matrices can be created using the `Eye<T>`

object, where `T`

is the desired data type (e.g., `float`

or `double`

). Diagonal matrices, on the other hand, are stored as a Vector and require the use of `diag()`

to be interpreted correctly. Both can be assigned to dense Matrix or be directly integrated into the matrix sum:

```
Matrix<float> A(n, n);
Matrix<float> R(n, n);
Vector<float> D(n); // The diagonal matrix is stored as a vector.
R = Eye<float>();
R = diag(D);
R = A + 0.5 * Eye<float>();
R = A + 0.5 * diag(D);
```

You can also multiply regular and diagonal matrices:

```
R = 0.5 * A * diag(D);
R = 0.5 * diag(D) * A;
```

Note

The expression can **only** contain a single diagonal or identity matrix.

## Triangular Matrix¶

The Triangular Map represents the upper or lower triangular part of a dense Matrix, i.e., a map to only the elements at or above/below the main diagonal. This is referred to in Linear Algebra as a Triangular Matrix (if the matrix is square) or a Trapezoidal Matrix (if the matrix is non-square). Generally, they are returned as the result of some matrix factorization, such `QR`

, `LQ`

or `LU`

, but they can also be created explicitly by calling `A.upper_triangular()`

or `A.lower_triangular()`

, where `A`

is a dense, regular Matrix. The other part will not be referenced.

```
Matrix<float> A(n, m);
Matrix<float> B(m, m);
Matrix<float> C(n, m);
Matrix<float> R(n, m);
// Note that only the upper triangular of A will be copied to R,
// while all the elements below the diagonal will be left untouched.
R = A.upper_triangular();
R = A.lower_triangular() + C;
R = A.upper_triangular() * B + C;
```

Note

Transpose (`A.upper_triangular().T()`

), adjoint (`A.upper_triangular().H()`

) and complex conjugate (`A.upper_triangular().conj()`

) are also supported. However, in the matrix product, the regular matrix `B`

**must not** contain any of these operations.

Note

The expression can **only** contain a single triangular matrix.

## Slicing¶

Any BLAS/LAPACK operation can be called with vector and matrix slices instead of the full vector/matrix:

```
Matrix<float> A(n, n);
Matrix<float> B(n, n);
Vector<float> v1(n);
v1 = A * B(kAll, 2) + B(kAll, 1);
float alpha = dot(B(kAll, 2), B(kAll, 1));
```

Warning

If the elements in the left-hand and right-hand side overlap (e.g., `A({1, k + 1}, {1, k + 1}) = 5 * A({0, k}, {0, k})`

), the result will be incorrect due to Aliasing.

## Sparse BLAS¶

Due to the special layout of the CSR Matrix, only the following operations are supported:

Scaling:

csr1 = alpha * csr1; csr1 = alpha * csr2;

Addition:

csr1 = alpha * csr2 + beta * csr3;

Matrix product:

Matrix<double> dB(n, n), dC(n, n), dR(n, n); Vector<double> dv1(n), dv2(n), dv3(n); CSRMatrix<double> csr(n, n); double alpha, beta; // ... // // Sparse matrix x dense vector dv3 = alpha * csr * v1 + beta * v2; dv3 = alpha * csr.T() * v1 + beta * v2; dv3 = alpha * csr.H() * v1 + beta * v2; // Sparse matrix x dense matrix dR = alpha * csr * dB + beta * dC; dR = alpha * csr.T() * dB + beta * dC; dR = alpha * csr.H() * dB + beta * dC;

Norm:

double amax_v = amax(csr); double norm_frob_v = norm2(csr); // double norm_l1_v = norm_l1(csr); // The L1 norm is not supported for CSR matrices double norm_inf_v = norm_inf(csr);

Warning

There is currently no BLAS operation with Sparse Vector.

## Aliasing¶

In HighFM, we say that an expression has *aliasing* when a matrix or vector appears on both sides of the assignment operation, for instance, `mat1 = mat1 + mat2`

or `mat1 = mat1.T()`

. Although in the first case, the aliasing is harmless, the second one will produce incorrect results.

The Aliasing problem arises due to the lazy evaluation, i.e., the expression is only evaluated during the assignment. In this case, HighFM will transpose a block of `A`

and then immediately write the results to `A`

, so when it moves to the next block, it will use the new values of `A`

instead of the old ones. For example, HighFM will first compute `A(0, 1) = A(1, 0)`

and then `A(1, 0) = A(0, 1)`

, losing the original value of `A(1, 0)`

.

In general, the aliasing can only be detected during the runtime. In the default mode (“debug”), HighFM will analyze the expression during the runtime and determine if it has harmful aliasing or not. For instance, when trying to evaluate `mat1 = mat1.T()`

, the library will throw the following exception:

```
terminate called after throwing an instance of 'HighFM::HarmfulAliasing'
what(): void HighFM::internal::ConjugateTransposeOp<Expr, Flag>::eval_to(HighFM::MatrixBase<E>&) [with E = HighFM::Matrix<float>; Expr = HighFM::Matrix<float>; HighFM::internal::TransposeFlag Flag = HighFM::internal::kTranspose] at
/home/nicolas/Documents/IST/PhD/Code/HighFM/include/highfm/blas/transpose.hpp:120:25
HarmfulAliasing: Aliasing detected! You cannot assign the matrix transpose to itself (A = A.T)!
```

Besides built-in methods, such as `transpose_inplace()`

, the only way to solve the aliasing issue is to evaluate the expression to a temporary and then move the results from the temporary to the desired object.

## Expression Validation¶

HighFM will check if the created expression is valid or not. When possible, it will check during the compile time, producing compilation errors if it encounters any anomaly. These errors can be ugly and long, but they help to detect issues with your code. For example, mixing different types in the expression

```
Matrix<float> A(3, 3, 1);
Matrix<double> B(3, 3, 2);
Matrix<double> C(3, 3);
C = A + B;
```

will result in the following compilation error:

```
/home/nicolas/Documents/IST/PhD/Code/HighFM/include/highfm/blas/addition.hpp:62:35: error: static assertion failed: /home/nicolas/Documents/IST/PhD/Code/HighFM/include/highfm/blas/addition.hpp:62:
TypeError: All objects in the expression must have the same data type!
62 | HIGHFM_ASSERT(have_same_value_type, TypeError,
| ^~~~~~~~~~~~~~~~~~~~
```

However, not all validation checks can be done during compile time. For instance, checking if the dimensions of the operands in the expression are correct. In these cases, HighFM will validate the expression during the runtime and throw an exception if detects any illegal operation:

```
Matrix<float> A(4, 4, 1);
Matrix<double> B(3, 3, 2);
Matrix<double> C(4, 4);
C = A + B;
```

Output:

```
terminate called after throwing an instance of 'HighFM::LengthError'
what(): HighFM::internal::AdditionOp<Operands>::AdditionOp(HighFM::internal::ScalingOp<Operands>...) [with Operands = {HighFM::Matrix<float, HighFM::Host, HighFM::Array>, HighFM::Matrix<float, HighFM::Host, HighFM::Array>}] at
/home/nicolas/Documents/IST/PhD/Code/HighFM/include/highfm/blas/addition.hpp:56:21
LengthError: The dimensions of the operands do not match!
```