This blog post is dedicated to one of my favorite optimization techniques ever: using matrix exponentiation to calculate linear recurrence relations in logarithmic time.

Basically, the technique boils down to constructing a special transformation matrix that, when exponentiated, produces the coefficients of the $$n$$th term in the linear recurrence relation.

I first learned about this technique from this CodeForces tutorial some years ago. My “problem” with it is that, while it gets the general point across, it doesn’t explain why it works, assumes a lot of prior knowledge, and is therefore inaccessible for beginners. This post’s goal is to provide a more detailed explanation through step-by-step examples so that you can build an intuition for it and understand how to apply and adapt this technique to various problems.

A linear recurrence relation is an equation that relates a term in a sequence to previous terms using recursion. The use of the word *linear* refers to the fact that previous terms are arranged as a first-degree polynomial in the recurrence relation.

A linear recurrence relation has the following form:

$$$ x_n = c_1 \times x_{n - 1} + c_2 \times x_{n - 2} + \cdots + c_{k} \times x_{n - k} $$$

$$x_n$$ refers to the $$n$$-th term in the sequence, and $$c_i$$ refers to the (constant) coefficient in front of the $$x_{n - i}$$-th term in the linear recurrence relation.

All linear recurrence relations must have a “base case”, i.e. the first $$k$$ values of the sequence must be known.

A famous example of a sequence defined by a linear recurrence relation is the Fibonacci sequence, where we have the following:

$$$ x_0 = 0 \\ x_1 = 1 \\ x_n = x_{n - 1} + x_{n - 2} $$$

In the Fibonacci case, we have $$k = 2$$, $$c_1 = 1$$ and $$c_2 = 1$$.

Note that $$c_i$$ can be equal to $$0$$ as well, which would correspond to skipping a term, e.g.:

$$$ x_n = x_{n - 1} + 3 \times x_{n - 3} $$$

In this case, we have $$k = 3$$, $$c_1 = 1$$, $$c_2 = 0$$, and $$c_3 = 3$$.

A matrix is a rectangular array of numbers, arranged in rows and columns. In computer terms, it’s an array of arrays of numbers, e.g. `int[][]`

.

For example,

$$$ \begin{bmatrix} 1 & 9 & -13\\ 20 & 5 & 6 \end{bmatrix} $$$

is a matrix with two rows and three columns, or a “two by three” matrix. Again, in computer terms, it’d be an `int[2][3]`

:

```
int matrix[2][3] = {
{1, 9, -13},
{20, 5, 6}
};
```

Matrices are used in a wide range of mathematical areas, including, but not limited to, linear algebra and graph theory. If you’ve worked with graphs, you might’ve heard of adjacency matrices before.

There’s a lot of complicated math surrounding matrices, but all you need to know to understand this technique is what matrix multiplication is.

Matrix multiplication is an operation that takes two matrices, e.g. $$A$$ and $$B$$, one with dimensions $$M$$ and $$N$$, and the other with dimensions $$N$$ and $$P$$ (i.e. the number of columns in the first matrix must be the same as the number of rows in the second matrix), and produces a new matrix with dimensions $$M$$ and $$P$$. Each entry in the new matrix is obtained by multiplying term-by-term the entries of a row of $$A$$ with a column of $$B$$.

In other words, matrix multiplication is an operation that takes the $$i$$th row of $$A$$ and the $$j$$th column of B, multiplies their entries term-by-term, sums them up, and puts the number that comes out in the $$i$$th row and $$j$$th column of the new matrix.

It’s a mouthful, but it’s not as complicated as it sounds. Here’s a color-coded example so you know what’s what:

$$$ \begin{bmatrix} \color{#dc2626}{1} & \color{#dc2626}{2} \\ \color{#2563eb}{3} & \color{#2563eb}{4} \end{bmatrix} \times \begin{bmatrix} \color{#16a34a}{5} & \color{#ca8a04}{6} \\ \color{#16a34a}{7} & \color{#ca8a04}{8} \end{bmatrix} = \begin{bmatrix} \color{#dc2626}{1} \color{white}{\times} \color{#16a34a}{5} \color{white}{+} \color{#dc2626}{2} \color{white}{\times} \color{#16a34a}{7} & \color{#dc2626}{1} \color{white}{\times} \color{#ca8a04}{6} \color{white}{+} \color{#dc2626}{2} \color{white}{\times} \color{#ca8a04}{8} \\ \color{#2563eb}{3} \color{white}{\times} \color{#16a34a}{5} \color{white}{+} \color{#2563eb}{4} \color{white}{\times} \color{#16a34a}{7} & \color{#2563eb}{3} \color{white}{\times} \color{#ca8a04}{6} \color{white}{+} \color{#2563eb}{4} \color{white}{\times} \color{#ca8a04}{8} \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix} $$$

Got it? Good.

There are a few other things I need to mention that will become important later:

- Matrix multiplication
**is not**commutative, i.e. $$A \times B ≠ B \times A$$. - Matrix multiplication
**is**associative, i.e. $$(A \times B) \times C = A \times (B \times C)$$. - The identity matrix is the matrix equivalent of $$1$$ in regular multiplication. It is a square matrix that has $$1$$ on the main diagonal and $$0$$ everywhere else, e.g.:

$$$ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} $$$

When you multiply any matrix by the identity matrix with the corresponding size, you get the same matrix back.

If you understood how matrix multiplication works, it should be clear why it works that way. If not, I encourage you to grab a pen and paper and multiply some matrices by hand. Before proceeding with the rest of this post, you must know how matrices are multiplied; otherwise, it won’t make much sense.

Here’s an implementation of matrix multiplication:

```
template <class T>
using vec = std::vector<T>;
template <class T>
using mat = vec<vec<T>>;
template <class T>
mat<T> operator*(mat<T> const& a, mat<T> const& b) {
int m = a.size(), n1 = a[0].size(), n2 = b.size(), p = b[0].size();
assert(n1 == n2);
mat<T> res(m, vec<T>(p, T(0)));
for (auto i = 0; i < m; ++i) {
for (auto j = 0; j < p; ++j) {
for (auto k = 0; k < n1; ++k) {
res[i][j] += a[i][k] * b[k][j];
}
}
}
return res;
}
```

It should be easy to see that its complexity is $$O(M {\times} N {\times} P)$$.

Binary exponentiation (also known as exponentiation by squaring) is a trick that allows to calculate $$a^n$$ using $$O(\log_2{n})$$ multiplications instead of the $$O(n)$$ multiplications required by the naive approach.

It also has important applications in many tasks unrelated to arithmetic, since it can be used with any operations that have the **associativity** property.

To keep this post from becoming overly long and all over the place, here are some good resources on binary exponentiation:

- This cp-algorithms.com entry
- This video by Errichto

To build up to the main point of this post, let’s first look at some other ways one might implement a function that computes a linear recurrence relation.

For the sake of simplicity, let’s take the Fibonacci sequence example (defined earlier).

The most obvious way to compute the $$n$$-th Fibonacci number is to write a function like this one:

```
uint64_t fibonacci(uint n) {
if (n <= 1) {
return n;
}
return fibonacci(n - 1) + fibonacci(n - 2);
}
```

On my hardware, this function can compute Fibonacci numbers until about $$n \leq 40$$ before becoming noticeably slow. But why’s that?

The answer is that we’re computing the same values multiple times, over and over again. For example, to compute `fibonacci(46)`

, we’re calling `fibonacci(2)`

$$1,134,903,170$$ times in the process! It is obvious that this approach cannot work for large values of $$n$$.

To avoid recomputing the same values over and over again, we can save them in some data structure like an array or a hash map, so that we can compute them once and simply retrieve them from said data structure the next time we need them. This is an idea you might recognize if you’re familiar with the concept of dynamic programming (hence why the array below is called `dp`

).

For the sake of simplicity, we’ll define the upper bound on $$n$$ to be $$93$$ (which is the largest Fibonacci number that can fit in a 64-bit unsigned integer data type) and use a fixed-size array:

```
#define MAX_N 93
uint64_t dp[MAX_N + 1] = {0};
uint64_t fibonacci(uint n) {
if (n <= 1) {
return n;
}
auto& result = dp[n];
if (result == 0) {
result = fibonacci(n - 1) + fibonacci(n - 2);
}
return result;
}
```

Since all Fibonacci numbers (except the $$0$$-th number) are greater than $$0$$, we can use $$0$$ to signify a missing value that needs to be computed.

On my hardware, this computes all values of Fibonacci up to $$93$$ inclusive pretty much instantly. So, problem solved?

Well, yes and no. While memoization solves the recomputation problem, it introduces a new problem: memory. In this case, it’s not that big of a deal, as we’re only going up to $$n \leq 93$$, but if we go further, e.g. if we want to compute Fibonacci numbers up to $$n \leq 10^9$$ modulo some number, we’d need to store $$10^9$$ 64-bit integers in memory, which would take up approximately 8 gigabytes.

We can come up with an iterative approach for computing Fibonacci numbers that requires a constant amount of memory if we notice the following: to compute the next Fibonacci number, we only need to know what the previous two Fibonacci numbers are. We don’t need to store all previous values in memory.

So, we can have three variables, out of which the first two would store the previous two Fibonacci numbers, and the third would store the new number, and each time we compute the next Fibonacci number, we “shift” their values to the left:

```
uint64_t fibonacci(uint n) {
if (n <= 1) {
return n;
}
uint64_t prev = 0, curr = 1, next;
for (auto i = 2u; i <= n; ++i) {
next = prev + curr;
prev = curr;
curr = next;
}
return curr;
}
```

On each iteration, `prev`

contains the value of $$F_{i - 2}$$ and `curr`

contains the value of $$F_{i - 1}$$, and they’re added together and stored in `curr`

, which results in $$F_{i}$$. After that, `prev`

gets assigned to `curr`

, and `curr`

gets assigned to `next`

, which shifts the values to the left, replacing $$F_{i-2}$$ and $$F_{i-1} $$ with $$F_{i-1}$$ and $$F_{i}$$, so that $$F_{i + 1}$$ can be computed on the next iteration.

Due to the order in which the operations are executed, at the end of the loop, both `curr`

and `next`

contain the value of $$F_{n}$$, so either one can be used as the final result.

This approach can be generalized and can be used to compute the $$n$$-th term of any linear recurrence relation in $$O(nk)$$ time and $$O(k)$$ space (where `k`

is the amount of initial values/coefficients and is a constant):

```
uint64_t compute(uint n,
std::vector<uint64_t> const& initial_values,
std::vector<uint64_t> const& coefficients) {
assert(initial_values.size() == coefficients.size());
auto k = initial_values.size();
if (n < k) {
return initial_values[n];
}
auto values = initial_values;
values.push_back(0);
for (auto i = k; i <= n; ++i) {
values.back() = std::inner_product(values.begin(), values.end() - 1,
coefficients.begin(), 0ULL);
std::shift_left(values.begin(), values.end(), 1);
}
return values.back();
}
```

The `compute`

function above does the same thing as our iterative `fibonacci`

function, except it supports arbitrary linear recurrence relations, defined by their initial values and coefficients. We can make our new function compute Fibonacci numbers by passing `{0, 1}`

as the initial values and `{1, 1}`

for the coefficients.

For those of you who aren’t too familiar with the C++ STL, the call to `std::inner_product`

multiplies each of the previous values (represented by the range from `values.begin()`

to `values.end() - 1`

) by the corresponding coefficient, and sums them up.

Let’s think about the previous approach a bit more, in particular, the step that computes the next value. To compute the next value, you need to multiply the entries from `values`

and `coefficients`

term-by-term, then sum them up. Does that sound familiar?

If you recall, this is exactly how each element in the resulting matrix is computed when doing matrix multiplication. So, what if we can construct a matrix that when multiplied by the initial values of the sequence, produces the next value and simultaneously shifts the previous values to the left?

Since matrix multiplication operates on the rows of the first matrix and the columns of the second matrix, we can go in one of two directions: we can either organize the initial values as a $$1 \times k$$ matrix and have the **columns** of our “transformation” matrix set up in a way that does what we want, in which case the initial values must be on the **left** side of the multiplication, or we can rotate both matrices by 90 degrees and have the initial values on the **right** side of the multiplication. I will go with the latter, as it’s the way I usually do it, but both options yield the same results.

I will refer to the so-called transformation matrix as $$T$$ and the initial values as $$F$$.

So, let’s think about the “shifting” part of things first. We want each row in the resulting matrix to be shifted by one. We can achieve this by taking the identity matrix and shifting it to the right, like this:

$$$ \begin{bmatrix} 0 & \color{#16a34a}{1} & 0 & 0 & \cdots & 0 \\ 0 & 0 & \color{#16a34a}{1} & 0 & \cdots & 0 \\ 0 & 0 & 0 & \color{#16a34a}{1} & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & 0 & \cdots & \color{#16a34a}{1} \\ 0 & 0 & 0 & 0 & \cdots & 0 \end{bmatrix} $$$

I think the best way to see what this does is through an example, so let’s look at what would happen for $$k = 3$$:

$$$ \begin{bmatrix} 0 & \color{#16a34a}{1} & 0 \\ 0 & 0 & \color{#16a34a}{1} \\ 0 & 0 & 0 \end{bmatrix} \times \begin{bmatrix} x_0 \\ x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 0 {\times} x_0 + \color{#16a34a}{1} \color{white}{\times} x_1 + 0 {\times} x_2 \\ 0 {\times} x_0 + 0 {\times} x_1 + \color{#16a34a}{1} \color{white}{\times} x_2 \\ 0 {\times} x_0 + 0 {\times} x_1 + 0 {\times} x_2 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ 0 \end{bmatrix} $$$

Our strategically placed ones do exactly what we want! So, all we need to do is fill the last row, and if you’re still with me, it should be obvious how: we fill it with the coefficients. So, our final matrix looks like this:

$$$ \begin{bmatrix} 0 & \color{#16a34a}{1} & 0 & 0 & \cdots & 0 \\ 0 & 0 & \color{#16a34a}{1} & 0 & \cdots & 0 \\ 0 & 0 & 0 & \color{#16a34a}{1} & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & 0 & \cdots & \color{#16a34a}{1} \\ \color{#16a34a}{c_k} & \color{#16a34a}{c_{k-1}} & \color{#16a34a}{c_{k-2}} & \color{#16a34a}{c_{k-3}} & \cdots & \color{#16a34a}{c_1} \end{bmatrix} $$$

Now we have a matrix that produces the next values in our sequence. So, to compute the $$n$$-th term in the sequence, we need to multiply $$T$$ by the values $$n$$ times, then take the first entry. And now, for the part that makes this fast: remember that matrix multiplication is associative? This means we can multiply $$T$$ by itself $$n$$ times (in other words, $$T^n$$) first, then multiply by the values. And how can we compute $$T^n$$ quickly? That’s right, binary exponentiation!

This approach has a total time complexity of $$O(k^3 \times log_2{n})$$, $$k^3$$ coming from the matrix multiplication algorithm, and $$log_2{n}$$ coming from the binary exponentiation algorithm. Since $$k$$ is a constant in the context of a specific linear recurrence relation problem, the execution time only grows logarithmically with $$n$$, which, as you know, is much better than linear.

Since the full implementation is long, I put it in a gist.

If this still feels like magic, I encourage you to sit down with a pen and paper. Come up with some linear recurrence relation, calculate some values by hand, try repeatedly substituting into the recurrence relation up to some fixed $$n$$, then exponentiate the transformation matrix by hand, observe the values in the intermediate results, and see if you can notice how everything ties together.

This technique seemed like voodoo to me at first as well, and going through things by hand was the thing that made it click in my head. I hope the way I built up to it makes it more obvious what the relationship between this approach and the easier-to-understand linear approach is.

]]>I was prompted to write this after watching Computerphile’s recent video on the binary search algorithm. I generally enjoy Dr. Mike Pound’s videos a lot, so I thought I’d play this one in the background, even if the topic is painfully familiar to me. All in all, Dr. Pound does a fine job at explaining the general concept of the algorithm and why it’s so ingenious but omitted what I think is a small but *very* significant detail.

For those of you who are unfamiliar with the binary search algorithm, I’d suggest watching the video first. For those of you who want a TL;DW, binary search is a search algorithm for finding the position of a target value within a sorted array in an optimal way. The general idea is to split the search space in half every iteration, comparing the middle point and our target value. Since we’re searching in a sorted array, one of three things can happen:

- The middle point is
**smaller**than our target value, in which case we know our target value cannot possibly be anywhere to the**left**of the middle point. - The middle point is
**bigger**than our target value, in which case the target cannot be on the**right**of the middle point. - The middle point happens to be our target, in which case we’re done.

The most common definition for `mid`

is as follows:

$$$ mid = \left\lfloor \frac{left + right}{2} \right\rfloor $$$

Binary search is **much** better than linear search for sorted data, as it uses the fact that all elements are ordered to halve the search space at every iteration, achieving a worst-case complexity of $$\text{O}(\log_2(n))$$, as opposed to linear search’s $$\text{O}(n)$$ complexity. It is a brilliant yet simple algorithm that makes searching through billions and billions of elements a trivial task.

To put in perspective how much better binary search is than linear search at scale, for an array with 4 billion elements, binary search has to do only 32 iterations, whereas linear search will have to look at all 4 billion elements in the worst-case scenario. If we double the search space, i.e. if we have 8 billion elements, binary search will need only one extra iteration, which will barely affect its runtime, while the runtime of linear search will double.

Of course, keeping the data sorted is an entirely different problem, which I won’t get into in this blog post.

Binary search is notoriously tricky to implement correctly, despite being so simple. In this blog post, we’ll look at one of the infamous pitfalls one can fall into while implementing it, in particular the one that the Computerphile video failed to mention.

Let’s look at a possible implementation of binary search. I’ve chosen C++ for this, but excluding the funky syntax for generics and references, the implementation should be easy to understand regardless of your experience with C++:

```
template <class T>
bool binary_search(std::vector<T> const& values, T target) {
int left = 0, right = values.size() - 1;
while (left <= right) {
int mid = (left + right) / 2;
if (values[mid] < target) {
left = mid + 1;
} else if (values[mid] > target) {
right = mid - 1;
} else {
return true;
}
}
return false;
}
```

At first glance, it might seem perfectly fine, and in practice, it’ll work fine for any small example you can come up with, but I promise, there’s something wrong with it. And no, it isn’t a logic error (although I have to admit, it took me an *embarrassing* amount of attempts to realize I had the cases backward at first), the algorithm itself is perfectly fine; the issue is more subtle than that.

The problem is in the way `mid`

is calculated. It might not be obvious at first, but the expression `(left + right) / 2`

can overflow. This can happen if `left`

and `right`

are large enough to overflow the `int`

type. With a little bit of math, we can figure out that we can trigger an overflow if we manage to get `left`

and `right`

to both be equal to a number larger than or equal to $$2^{30}$$, as the maximum value a `int`

can store is $$2^{31} - 1$$.

One might say that it’s unrealistic to be dealing with big enough arrays to trigger the issue, but I disagree, as the issue can be triggered in the implementation above with $$2^{31}/2 + 1 = 2^{30} + 1$$ 32-bit integers^{1}, which take up around $$4 \text{GiB}$$ of memory, which isn’t unreasonable, especially when we consider that the real strength of binary search is searching through large search spaces.

Let’s see if we can write some code to trigger it:

```
int main() {
// Least verbose C++ STL code (random number generation)
std::random_device rnd_device;
std::mt19937 mersenne_engine{rnd_device()};
std::uniform_int_distribution<int> dist{0,
std::numeric_limits<int>::max() - 1};
// Allocate the sear ch array with the appropriate size
auto n = (1 << 30) + 1; // 2^30 + 1
std::vector<int> values(n);
// Generate the search array
std::cout << "Generating " << n << " values..." << std::endl;
std::generate(values.begin(), values.end(),
[&]() { return dist(mersenne_engine); });
// Sort the search array
std::cout << "Sorting..." << std::endl;
std::sort(values.begin(), values.end());
// Set a target that's bigger than everything in our search array
auto target = std::numeric_limits<int>::max();
std::cout << "Searching..." << std::endl;
std::cout << binary_search(values, target) << std::endl;
return 0;
}
```

I’ll also add a log in the `while`

loop so that we can observe the algorithm’s state:

```
while (left <= right) {
int mid = (left + right) / 2;
std::cout << "left " << left << " right " << right << " mid " << mid
<< std::endl;
// ...
}
```

As mentioned, the code in the `main`

function above allocates $$2^{30} + 1$$ 32-bit integers between $$0$$ and $$2^{31} - 2$$^{2}, sorts them, then calls `binary_search`

with $$2^{31} - 1$$ as a target, which we know isn’t in the array, meaning that the binary search will have to repeatedly move the `left`

index to the right until it becomes equal to `right`

, which is exactly when we expect to run into an overflow.

Let’s try running it:

```
$ g++ -o binary_search -Ofast binary_search.cpp && ./binary_search
Generating 1073741825 values...
Sorting...
(omitting irrelevant logs to save vertical space)
left 1073741809 right 1073741824 mid 1073741816
left 1073741817 right 1073741824 mid 1073741820
left 1073741821 right 1073741824 mid 1073741822
left 1073741823 right 1073741824 mid 1073741823
left 1073741824 right 1073741824 mid -1073741824
zsh: segmentation fault ./binary_search
```

It runs for quite a long time (sorting 4 GiB worth of data is no joke), then boom! Segmentation fault!

The log we added to the `binary_search`

loop is really helpful for understanding what’s going on: we can see how `left`

is slowly creeping up in value as we check more and more numbers that are smaller than our target, until finally, `left`

and `right`

both become $$1\,073\,741\,824$$, which is $$2^{31}/2 = 2^{30}$$. Adding $$2^{30}$$ to itself gives us $$2^{31}$$ -- one bigger than the largest value we can fit in a 32-bit signed integer, which shoots us over to the negative numbers, causing a segfault.

Let’s think about the situation mathematically. We’re trying to calculate the midpoint $$mid$$ of a range defined by $$left$$ and $$right$$. We know that $$left \leq mid \leq right$$, so there must be a way to calculate $$mid$$ without overflow. What we need to do is come up with an alternative formula for $$mid$$ -- one that doesn’t “go through” any large intermediate values to arrive at the final result.

The problem lies in the addition of $$left$$ and $$right$$, so we’d be in business if we could find a way to rewrite the original formula in a way that avoids it. Since we know the final result will be larger than $$left$$ and smaller than $$right$$, we should be able to arrive at the result if we add some non-negative number $$x$$ to $$left$$. It should be easy to notice that $$x$$ must be sufficiently small so that $$left + x <= right$$, which in practice means that if we find a way to separate $$left$$ from the fraction, we’ll end up with an overflow-free expression.

Thankfully, a fairly simple transformation gets us exactly what we want (integer division is assumed to avoid the noise of adding floor everywhere):

$$$ mid = \frac{left + right}{2} = \frac{2 \times left - left + right}{2} = left + \frac{right - left}{2} $$$

Have I lost you? No? Good.

So, this is the “math” behind the solution to our overflow problem. But does it work in practice? Only one way to find out! Let’s modify our proof-of-concept program and check if we can still trigger a segfault:

```
while (left <= right) {
// int mid = (left + right) / 2;
int mid = left + (right - left) / 2;
std::cout << "left " << left << " right " << right << " mid " << mid
<< std::endl;
// ...
}
```

This is the output we get when we try running the program now:

```
$ g++ -o binary_search -Ofast binary_search.cpp && ./binary_search
Generating 1073741825 values...
Sorting...
(omitting irrelevant logs to save vertical space)
left 1073741809 right 1073741824 mid 1073741816
left 1073741817 right 1073741824 mid 1073741820
left 1073741821 right 1073741824 mid 1073741822
left 1073741823 right 1073741824 mid 1073741823
left 1073741824 right 1073741824 mid 1073741824
0
```

Awesome! As you can see, our test program no longer segfaults and correctly returns `false`

, as our target isn’t contained in the array. What if we set the last element to our target? Will our algorithm find it? Let’s see!

Let’s set the last element to `target`

:

```
// Setting a target that's bigger than everything in our search array
auto target = std::numeric_limits<int>::max();
values.back() = target;
// ...
```

If we re-run, this is the output we see:

```
$ g++ -o binary_search -Ofast binary_search.cpp && ./binary_search
Generating 1073741825 values...
Sorting...
(omitting irrelevant logs to save vertical space)
left 1073741809 right 1073741824 mid 1073741816
left 1073741817 right 1073741824 mid 1073741820
left 1073741821 right 1073741824 mid 1073741822
left 1073741823 right 1073741824 mid 1073741823
left 1073741824 right 1073741824 mid 1073741824
1
```

As expected, our binary search successfully finds the target right at the very end of the array. Problem solved! Well, at least for search spaces up to $$2^{31}$$…

If we want to go even bigger, the `int`

data type will no longer cut it. I kept the indices as 32-bit signed integers for simplicity’s sake. Had I used `size_t`

(64-bit unsigned integers), the problem would’ve very much still existed (albeit not as spectacular^{3}), but the reproduction would’ve required a much bigger search space. For completeness though, let’s switch to a bigger data type so that we can handle even bigger inputs:

```
template <class T>
bool binary_search(std::vector<T> const& values, T target) {
size_t left = 0, right = values.size() - 1;
while (left <= right) {
size_t mid = left + (right - left) / 2;
// ...
}
}
```

Alternatively, we could've solved this by adding a size check. If for whatever reason you’d prefer if your binary search used 32-bit integers (I suppose performance could be a reason, but binary search is already stupidly fast, so differences would be negligible on modern hardware), checking the size of `values`

before proceeding is an option.

Making assumptions like this is perfectly fine if we're solving some specific problem. If you knew you wouldn't be dealing with search spaces as large, fine, but please, verify your assumptions.

Programming is hard. As a project grows in complexity, the surface area for bugs increases drastically. As developers, we're like architects in a digital landscape, and the integrity of our structures (software), relies heavily on the bedrock of thoughtful design and careful execution. Being aware of the implications of our design and implementation decisions is crucial because these choices are like dominoes; a single misstep can trigger a cascade of vulnerabilities, each with the potential to compromise our work and user trust.

To some of you, this error might seem extremely niche and unlikely to happen in practice, but it’s important to look at the bigger picture. Imagine this as some small, harmless-looking utility function in the context of a large project. Back when you wrote that utility function, you might not have seen anything wrong with it, just as our initial binary search implementation looked and worked completely fine. But under just the right conditions, this very function could bring down our entire app. This is why it’s important to “sweat” the details.

Also, to expand on the last paragraph of the previous section about verifying your assumptions, I can't stress how important this is. Making assumptions is fine, but not checking them is just asking for trouble. It can save you by catching errors caused by the violation of said assumptions early before they can cause any harm. As much as you want to convince yourself that the single `if`

statement involved in doing so will be the bottleneck of your project, trust me, it won't.

Oh, and speaking of harmless-looking utility functions in the context of large projects, the `Arrays#binarySearch`

function in Java had this very issue some 10 years ago! Look it up, I'm not kidding. I guess this is another mini-lesson about not blindly trusting libraries.

Unfortunately, there's no “silver bullet” for all of our problems (sorry, Rust shills, but rewriting it in Rust ain’t it). The only solution is this: awareness. Be aware that sometimes the trickiest errors aren't caused by complexity, but by “simplicity”. Sometimes, a single addition is enough.

]]>You’ve probably heard of FizzBuzz by now. It’s a seemingly simple interview question that can tell you a lot about how a candidate approaches problems and what their coding style is. It is also really good at filtering out most candidates who have absolutely no clue what they are doing.

For those of you who have managed to avoid it all this time, here’s the problem statement:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers that are multiples of both three and five print “FizzBuzz”.

FizzBuzz is an interesting problem because it’s simple yet can be approached in many different ways. There is a compilation of hundreds of different solutions in hundreds of languages on Rosetta Code, some more ridiculous than others. In this post, I’ll show you one of the more outlandish ways to solve FizzBuzz -- with types.

The idea for solving FizzBuzz with types isn’t an original one. It’s just one of the many problems in the collection of type challenges for TypeScript. I’ve been solving type challenges for quite a while now, and they’re great fun. They do a great job at showcasing all the tricks of TypeScript’s absolutely arcane type system, and I’ve already incorporated some of the techniques I’ve learned in production code. All of the type challenges I’ve solved can be found on GitHub.

This blog post assumes some basic knowledge of the TypeScript type system. I’ve done my best to break down the solution into fairly simple pieces, but if you’ve never written or seen a single conditional or recursive type in your life, this might not make much sense to you. If you’re new to all of this and it sounds interesting to you, start here.

Without further ado, let’s talk FizzBuzz:

As advanced as TypeScript’s type system is, there are still some things that are very hard to do. Arithmetic is one of those things. The type challenges that concern arithmetic are classified as “extreme”. This is problematic because most FizzBuzz solutions (particularly in imperative programming languages) rely on the modulo operation to check if the current number is divisible by 3 and/or 5. Attempting to implement FizzBuzz with types this way would be very hard.

Let’s try solving a simpler version of the problem first. This is a great problem-solving technique that can often help you understand the original problem better. Let’s try solving just the “Fizz” part of FizzBuzz, i.e. let’s try to write a type `Fizz<N>`

that returns an array containing “Fizz” in all the right places:

We know that we have to print “Fizz” for every number that is divisible by three. At first glance, this seems just as hard for the same reason mentioned above, but there’s a fairly obvious observation to be made: we know that every third number is divisible by three. Instead of going one number at a time and trying to check if it’s divisible, we can directly output three numbers at a time, as we know that the third of them will always be divisible by three:

`[1, 2, *Fizz*, 4, 5, *Fizz*, 7, 8, *Fizz*, 10, 11, *Fizz*, …]`

This is great because we don’t need any arithmetic operations for this. There’s still one thing to figure out though, and it’s when to stop, i.e. how to know if we’ve outputted enough elements. We need some way to subtract from `N`

until it becomes zero. Surely this must require some arithmetic, right?

Thankfully, there’s a fairly easy way to deal with positive integers up to about $$10^3$$ (which should be sufficient for this problem), and it involves arrays. As we’ve already mentioned, we can’t directly manipulate numbers, but we can manipulate arrays, and arrays have a length. This allows us to represent any number `T`

as an array with `T`

elements in them. Here’s a type `NumberToTuple<T>`

that takes a number `T`

and produces an array with length `T`

:

```
type NumberToTuple<
T extends number,
Acc extends readonly unknown[] = [],
> = Acc["length"] extends T
? Acc
: NumberToTuple<T, [...Acc, unknown]>;
```

`NumberToTuple<T>`

is a fairly simple recursive type with a base case of `Acc["length"] extends T`

, in which case `Acc`

is returned, and a recursive case in which we append a new element to `Acc`

. It should be reasonably obvious how and why this works -- we just keep appending to `Acc`

until its length is equal to `T`

. The value we append doesn’t matter, it could be anything. In this particular case, I’ve used `unknown`

, but it could’ve also been absolutely any other type.

Now that we have a way to represent numbers in a form that we can manipulate, we can finally implement subtraction. Some of you might have already guessed that we’ll simply be recursively removing one element at a time from the array representation of our number. How can we know we’ve removed enough? Easy! We just use another array to keep track of that:

```
type Drop<
T extends readonly unknown[],
N extends number,
Acc extends readonly unknown[] = [],
> = Acc["length"] extends N
? T
: T extends [unknown, ...infer Rest]
? Drop<Rest, N, [...Acc, unknown]>
: T;
```

`Drop<T, N>`

removes `N`

elements from the front of `T`

, which in essence is a subtraction operation for our “array numbers”. This is the first time we’ve seen the `infer`

keyword in this post. If you’ve never heard of it before, it basically does what the name suggests: it allows us to infer from types we compare against. It could be used to infer the return type of a function, the type of the elements of an array, or as in this case, it could be used to infer a specific part of an array. `T extends [unknown, ...infer Rest]`

essentially infers the type of every element in the array except the first one, effectively removing it.

If none of this makes sense, this is the time to take a break from this blog post and research this further, as understanding it is crucial for the rest of the solution.

For those who are still following my train of thought, it’s time to put the pieces together and define the `Fizz<N>`

type:

```
type Fizz<
N extends readonly unknown[],
Acc extends readonly unknown[] = [],
> = N["length"] extends 0
? Acc
: Fizz<Drop<N, 3>, [...Acc, never, never, "Fizz"]>;
```

`Fizz<N>`

takes a “number array” `N`

and recursively subtracts three from it until it’s empty and adds three elements to `Acc`

, the third one being “Fizz”. This type should successfully solve the “Fizz” part of the problem, so let’s test it:

```
type Foo = Fizz<NumberToTuple<6>>;
```

We expect `Foo`

to be `[never, never, "Fizz", never, never, "Fizz"]`

, and it is. Great!

The keen-eyed ones of you have definitely already spotted yet another problem we have though -- our `Fizz`

type always outputs an array with a length that is a multiple of three. In other words, `Fizz<NumberToTuple<5>>`

will output the exact same thing as `Fizz<NumberToTuple<6>>`

and so on. This is completely intentional though, as it keeps our `Fizz`

type simple and straightforward. We “overshoot” on purpose, as dealing with the edge cases in the `Fizz`

type itself is very complicated. It is much simpler to take the exact amount of elements we want afterwards, with the help of a type like this:

```
type Take<
T extends readonly unknown[],
N extends number,
> = T['length'] extends N
? T
: T extends [...infer Init, unknown]
? Take<Init, N>
: T;
```

`Take<T, N>`

uses the same `infer`

technique to remove elements from `T`

, this time from the right, until the length of `T`

is `N`

. Now, we can simply wrap our result with `Take`

and the overshooting problem will be solved. With this, the solution for the simpler version of the problem is complete. Oh, yeah, if you’ve forgotten, this isn’t even the original problem we are trying to solve. Thankfully, we’re pretty close now.

It should be trivial to notice that we can implement `Buzz`

in an equivalent way:

```
type Buzz<
N extends readonly unknown[],
Acc extends readonly unknown[] = [],
> = N["length"] extends 0
? Acc
: Buzz<Drop<N, 5>, [...Acc, never, never, never, never, "Buzz"]>;
```

But why is any of this helpful? Well, let’s look at the outputs for `Fizz`

, `Buzz`

, and the expected output of `FizzBuzz`

for some value of `N`

, e.g. 20:

N = 20 | Fizz |
Buzz |
FizzBuzz |
---|---|---|---|

1 | never | never | 1 |

2 | never | never | 2 |

3 |
Fizz |
never | Fizz |

4 | never | never | 4 |

5 |
never | Buzz |
Buzz |

6 |
Fizz |
never | Fizz |

7 | never | never | 7 |

8 | never | never | 8 |

9 |
Fizz |
never | Fizz |

10 |
never | Buzz |
Buzz |

11 | never | never | 11 |

12 |
Fizz |
never | Fizz |

13 | never | never | 13 |

14 | never | never | 14 |

15 |
Fizz |
Buzz |
FizzBuzz |

16 | never | never | 16 |

17 | never | never | 17 |

18 |
Fizz |
never | Fizz |

19 | never | never | 19 |

20 |
never | Buzz |
Buzz |

For those of you who already knew where I was going from the beginning, you’re very clever, have a cookie: 🍪. Either way, it is easy to notice that `FizzBuzz<N>`

is an element-wise concatenation of `Fizz<N>`

and `Buzz<N>`

, ignoring `never`

values and if both elements are `never`

, the index is taken as the value. It should be obvious why the element-wise concatenation satisfies all cases of the problem statement.

We’re still left with implementing this concatenation/merging step though. There are many different ways to approach it, but the one I like the most involves a matrix operation called “transpose”, as this approach can be scaled trivially to other extensions of this problem which involve adding extra divisibility cases.

Here’s a way to implement `Transpose<T>`

:

```
type Transpose<T extends ReadonlyArray<readonly unknown[]>> = (
T["length"] extends 0 ? [] : T[0]
) extends infer First extends readonly unknown[]
? {
[X in keyof First]: {
[Y in keyof T]: X extends keyof T[Y] ? T[Y][X] : never;
};
}
: never;
```

It may look scary, but all it really does is take a two-dimensional array/matrix `T`

, iterate over each column `X`

and each row `Y`

and produce a new matrix where each element is `T[Y][X]`

, essentially flipping `T`

over its diagonal.

`Transpose<T>`

allows us to group each element of the two `Fizz`

and `Buzz`

arrays, after which we can concatenate them. For example, let’s say we transpose the outputs of `Fizz<6>`

and `Buzz<6>`

, which look like this:

```
type PreTranspose = [
[never, never, "Fizz", never, never, "Fizz"],
[never, never, never, never, "Buzz", never],
];
```

After the transposition, we’ll get something like this:

```
type PostTranspose = [
[never, never],
[never, never],
["Fizz", never],
[never, never],
[never, "Buzz"],
["Fizz", never],
];
```

This allows us to convert what would’ve been column-wise operations into row-wise operations, which just makes the implementation of the `Merge`

type easier. Now that we know how to “group” our Fizzes and Buzzes with the help of `Transpose`

, let’s first implement a type `MergeRow`

that can handle the concatenation of a single grouped row:

```
type MergeRow<
Row extends readonly string[],
Acc extends string = ``,
> = Row extends [
infer Head extends string,
...infer Rest extends readonly string[],
]
? MergeRow<Rest, [Head] extends [never] ? Acc : `${Acc}${Head}`>
: Acc;
```

`MergeRow<Row>`

takes a row out of our new transposed matrix of Fizzes and Buzzes, recurses over each element, and appends it to `Acc`

if it’s not `never`

. We can now implement a `Merge`

type that does the full concatenation/merge process:

```
type Merge<Parts extends ReadonlyArray<readonly string[]>> =
Transpose<Parts> extends infer Rows extends ReadonlyArray<
readonly string[]
>
? {
[Key in keyof Rows]: MergeRow<Rows[Key]> extends infer Result
? Result extends ""
? Key
: Result
: never;
}
: never;
```

`Merge<Parts>`

takes an array of Fizzes and Buzzes (and potentially more) and returns the final FizzBuzz array. As already mentioned, it first transposes `Parts`

, then iterates over each row and merges it individually. If the row merge results in an empty string, it returns the `Key`

, as it means that all elements in the row were `never`

.

We now have all the pieces we need to write a final `FizzBuzz<N>`

type:

```
type FizzBuzz<N extends number> =
NumberToTuple<N> extends infer T extends readonly unknown[]
? Merge<[Fizz<T>, Buzz<T>]> extends infer Merged extends
readonly unknown[]
? Take<Merged, N>
: never
: never;
```

`FizzBuzz<N>`

first converts the number `N`

to an “array number”, then computes `Fizz`

and `Buzz`

individually in order to pass them on to `Merge`

, and in the end, the merged result is passed through `Take`

to trim it down to the exact length. Let’s test it:

```
type Bar = FizzBuzz<5>;
```

When we test our `FizzBuzz<N>`

type, we notice something obvious: arrays are zero-indexed. Duh! We get the following result:`["0", "1", "Fizz", "3", "Buzz"]`

.

Thankfully, this is an easy fix. We can just pad the results of our `Fizz`

and `Buzz`

types to contain a 0-th element, then drop it at the end. Let’s modify the `Fizz`

and `Buzz`

types like this:

```
type Fizz<
N extends readonly unknown[],
Acc extends readonly unknown[] = [never],
> = N["length"] extends 0
? Acc
: Fizz<Drop<N, 3>, [...Acc, never, never, "Fizz"]>;
type Buzz<
N extends readonly unknown[],
Acc extends readonly unknown[] = [never],
> = N["length"] extends 0
? Acc
: Buzz<Drop<N, 5>, [...Acc, never, never, never, never, "Buzz"]>;
```

Now, to drop the element we added for padding, let’s modify the `FizzBuzz`

type as well:

```
type FizzBuzz<N extends number> =
NumberToTuple<N> extends infer T extends readonly unknown[]
? Merge<[Fizz<T>, Buzz<T>]> extends infer Merged extends
readonly unknown[]
? Take<Drop<Merged, 1>, N>
: never
: never;
```

With these changes, our `FizzBuzz`

type finally works! For the whole thing, check here.

More specifically, I've been experimenting with replacing glibc with musl and seeing how complicated it would be to get running. And, spoiler alert, considering you're currently reading this, it's safe to assume I've had a fair bit of success with it.

You might be wondering how Funtoo comes into the story, and I got the answers. We recently announced a new Funtoo project called `Evolved Bootstrap`

. The idea behind it is to develop a way to bootstrap a Funtoo Linux system completely from scratch, using any Linux-like environment, for any architecture. The end goal is to have a straightforward and automated way to port Funtoo for any currently unsupported architecture.

One cannot fly into flying, though, so we need to learn how to walk first: we need a way of bootstrapping a minimal cross-compiled environment that we can expand upon later. And this is basically what the concept of CLFS is: building a Linux system from scratch for another architecture. Right now, we're running through the CLFS books and seeing what it takes to get things running for different architectures, with various package upgrades, etc.

The coolest part about the Evolved Bootstrap, in my opinion, is that it will also unlock the doors to alternative init systems, libc implementations, compilers, etc. That's why I decided to experiment with replacing glibc with musl. The CLFS embedded book already used musl, but as the name may suggest, it caters to building Linux systems for embedded devices and doesn't cover our use case.

The musl project has always been super interesting to me, but I've never had a real chance to play around with it on Funtoo, and it has never been a top priority for us to support. The Evolved Bootstrap project gave me a valid excuse to dive right into it, though.

By mixing and matching the standard and embedded CLFS books and through countless workarounds, I've managed to get a chrootable environment going with musl instead of glibc. You can find my instructions on the Funtoo wiki here. They are a more concise and copy-and-paste-ready version of the CLFS book, and I've tested them for `x86_64`

and `aarch64`

so far. I'm currently working on getting them to work for `powerpc64`

as well.

If the Evolved Bootstrap project sounds like fun to you, you can learn more about it here, and you can stop by our Discord if you feel like chatting to us about it and perhaps even getting involved.

]]>This is needed since not everything works with every Node version. And sometimes, you're just stuck with a particular version. My goal is to avoid having to resort to "Node version managers" or anything like that and just have the ability to install any Node version directly via Portage.

To achieve this, we'll need to combine a couple of sources of information. Firstly, we'll need to fetch all releases and group them by major version via the GitHub API, similarly to how we're doing it now. We'll also need to fetch a list of LTS release channels to know which Node versions to unmask by default.

The latter is needed so that we only offer LTS releases to the users by default but still leave a way to install the latest current, or even a discontinued version if that's what you're after, I'm not judging :D

In particular, we can fetch this JSON file from the `Releases`

repository in the Node organization. It gives us a timeline of all Node releases, and we can use that to figure out whether a release is still supported and, more importantly, whether it's an LTS release.

Let's begin with the fetching of GitHub releases. If we pay close attention to the API reference, we can see that the "list releases" is in fact paginated. Since we want to fetch all releases, we'll have to fetch all pages.

The most convenient way to do this in my opinion is to write an `async generator`

. Its purpose will be to query the GitHub API until there are pages left and yield to us the releases it receives. We'll later use that information to find the latest release per release channel.

My implementation of this generator looks like this:

```
async def release_generator(hub, github_user, github_repo):
page = 0
while True:
releases = await hub.pkgtools.fetch.get_page(
f"https://api.github.com/repos/{github_user}/{github_repo}/releases?page={page}&per_page=100",
is_json=True,
)
if not releases:
break
for release in releases:
yield release
page += 1
```

In an infinite loop, it queries the GitHub API for the releases on the current page. If we receive nothing, this means we've reached the end of the releases, so we can terminate. Otherwise, it yields all received releases and moves on to the next page. Nothing too fancy.

Note that we're also specifying the `per_page`

query parameter. It's set to `100`

(the maximum) to minimize the amount of requests we make to GitHub.

With that out of the way, we can now iterate all releases like this:

```
async for release in release_generator(hub, github_user, github_repo):
pass
```

Simple as that! Now, let's define a dictionary, in which we'll store the latest release for every release channel up to this point. The high-level description of the algorithm for figuring out what the latest release for every major version looks like this:

- Parse the current release's version to get its major version.
- Check the dictionary to see what the latest release for this major version is up to this point.
- If nothing is found or the current version is newer than the version of the stored release, update the dictionary.
- Otherwise, proceed with the next release.

Translated to Python, the algorithm looks like this:

```
latest_release_by_major = {}
async for release in release_generator(hub, github_user, github_repo):
release_version = release["version"] = version.parse(release["tag_name"])
release_major = str(release_version.major)
latest_release = latest_release_by_major.get(release_major)
if latest_release is None or latest_release["version"] < release_version:
latest_release_by_major[release_major] = release
```

If we inspect the versions of all the releases in the dictionary after the loop, we get something that looks like this:

```
Release channel: 17; Latest version: 17.4.0
Release channel: 16; Latest version: 16.13.2
Release channel: 14; Latest version: 14.18.3
Release channel: 12; Latest version: 12.22.9
Release channel: 15; Latest version: 15.14.0
Release channel: 10; Latest version: 10.24.1
Release channel: 13; Latest version: 13.14.0
Release channel: 8; Latest version: 8.17.0
Release channel: 11; Latest version: 11.15.0
Release channel: 6; Latest version: 6.17.1
Release channel: 9; Latest version: 9.11.2
Release channel: 4; Latest version: 4.9.1
```

Perfect! All that's left to do is to figure out which of those versions we want to unmask. So, fetch the release schedule JSON and parse the `end`

dates within it.

To parse the dates, we'll need to import `datetime`

:

```
from datetime import date
```

Now, for the actual fetching and parsing of the release schedule:

```
release_schedule = await hub.pkgtools.fetch.get_page(
f"https://raw.githubusercontent.com/{github_user}/Release/main/schedule.json",
is_json=True,
)
today = date.today()
for release_channel, schedule in release_schedule.items():
major_version = release_channel.lstrip("v")
release = latest_release_by_major.get(major_version)
if release is None:
continue
if "lts" not in schedule:
continue
end_date = date.fromisoformat(schedule["end"])
if today > end_date:
continue
release["unmasked"] = True
```

As mentioned above, we fetch the release schedule, iterate through every release channel in it, try to find the corresponding release in our dictionary, then check if it's an LTS release at all, and lastly, we check whether it has reached its EOL.

If all three of those conditions hold, we mark the release as "unmasked" by setting the corresponding key to `True`

.

Armed with that information, we can simply loop through the dictionary and generate an ebuild for every single one. To make this a bit nicer, let's extract the actual ebuild generation to a separate function:

```
def generate_for_release(hub, release, **pkginfo):
release_version = release["tag_name"].lstrip("v")
tarball_url = release["tarball_url"]
tarball_artifact = hub.pkgtools.ebuild.Artifact(
url=tarball_url, final_name=f"{pkginfo['name']}-{release_version}.tar.gz"
)
ebuild = hub.pkgtools.ebuild.BreezyBuild(
**pkginfo,
version=release_version,
artifacts=[tarball_artifact],
unmasked="unmasked" in release,
)
ebuild.push()
```

Nothing special here, except for the new "unmasked" parameter in `BreezyBuild`

. As you may have noticed, I've removed `github_user`

and `github_repo`

from the `BreezyBuild`

. We can just set those in `pkginfo`

to avoid having to pass them as separate arguments every time:

```
github_user = pkginfo["github_user"] = "nodejs"
github_repo = pkginfo["github_repo"] = "node"
```

With that, our autogen already works! The only thing left is to tweak the template to take `unmasked`

into account:

```
KEYWORDS="{{ '*' if unmasked else '' }}"
```

And we're pretty much done! Let's run the autogen and check whether the keywords are as we expect:

```
$ grep -r "KEYWORDS="
templates/nodejs.tmpl:KEYWORDS="{{ '*' if unmasked else '' }}"
nodejs-17.4.0.ebuild:KEYWORDS=""
nodejs-8.17.0.ebuild:KEYWORDS=""
nodejs-11.15.0.ebuild:KEYWORDS=""
nodejs-10.24.1.ebuild:KEYWORDS=""
nodejs-12.22.9.ebuild:KEYWORDS="*"
nodejs-16.13.2.ebuild:KEYWORDS="*"
nodejs-6.17.1.ebuild:KEYWORDS=""
nodejs-4.9.1.ebuild:KEYWORDS=""
nodejs-13.14.0.ebuild:KEYWORDS=""
nodejs-15.14.0.ebuild:KEYWORDS=""
nodejs-14.18.3.ebuild:KEYWORDS="*"
nodejs-9.11.2.ebuild:KEYWORDS=""
```

And indeed, they are! We can try to unmask and emerge an older version of Node and see whether our template works for it as well:

```
# mkdir -p /etc/portage/package.accept_keywords
# cat > /etc/portage/package.accept_keywords/nodejs <<EOF
net-libs/nodejs **
EOF
# emerge -a =nodejs-6.17.1
$ node --version
v6.17.1
```

Piece of cake!

*NOTE*: There is actually an issue with Node v4 in particular: it uses Python 2 to build. While we could just drop it, it's equally as simple to make it build with Python 2 as well. Since we still have Python 2 at the time of writing this, why not?

According to the changelogs, the first version to support Python 3 was Node v6, so let's add a `python_compat`

variable to the `BreezyBuild`

to fix this:

```
ebuild = hub.pkgtools.ebuild.BreezyBuild(
**pkginfo,
version=release_version,
artifacts=[tarball_artifact],
unmasked="unmasked" in release,
# NOTE: First version to support Python 3+ was Node 6,
# use Python 2.7 for anything older.
python_compat="python3+" if release["version"].major >= 6 else "python2_7",
)
```

Let's tweak the template to use the newly added variable as well:

```
PYTHON_COMPAT=( {{ python_compat }} )
```

And if we try again:

```
$ doit
# emerge -a =nodejs-4.9.1
$ node --version
v4.9.1
```

If this isn't cool, then I don't know what is :D

As always, the full source can be found here.

]]>Our goal for this post is to write an autogen script, which automatically figures out what the latest release of Node.js is and generates an ebuild for it. For simplicity, we'll target the latest current release of Node.js since it will make things a bit easier.

As mentioned previously, autogens are powered by the metatools framework. Our first step should be to install metatools in our development environment. In my case, I'll be using the existing LXD container I made for the last post.

We're interested in installing the latest development sources. At the time of writing this, the metatools ebuild in the Funtoo kits doesn't work, but we can still use it to instruct Portage to pull in the necessary dependencies.

So, let's run `emerge --onlydeps metatools`

and give it a moment to finish. After that, we also need to install MongoDB, since it's used by metatools for persistence of various things: `emerge mongodb`

.

A couple of massive compilations later, we can proceed with the metatools installation. All that's left is to clone the repos and set a few environment variables:

```
$ git clone ssh://git@code.funtoo.org:7999/~drobbins/funtoo-metatools.git
$ git clone ssh://git@code.funtoo.org:7999/~drobbins/subpop.git
$ cat >> .bashrc <<EOF
export PATH=$HOME/funtoo-metatools/bin:$PATH
export PYTHONPATH=$HOME/subpop:$HOME/funtoo-metatools
EOF
$ source .bashrc
$ doit --help
```

This should be the entire metatools setup covered. All that's left is to start MongoDB and we're ready to go: `/etc/init.d/mongodb start`

.

Let's go back in `nodejs-overlay/net-libs/nodejs`

and turn our ebuild into a template:

```
$ mkdir templates
$ mv nodejs-16.13.2.ebuild templates/nodejs.tmpl
```

We can also wipe the manually generated Manifest file since metatools will automatically generate it for us when we run the autogen.

We'll get back to the template once we flesh out our autogen script, which is precisely what we're going to get into now. We need to make an `autogen.py`

file and define a `generate`

async function in it like so:

```
#!/usr/bin/env python
async def generate(hub, **pkginfo):
pass
```

I won't get into the details of how metatools works - for that you can refer to the documentation. What we need to know is that `generate`

acts as the entry point of the autogen script. I'll also briefly explain what the two arguments we receive in `generate`

are:

`hub`

: core paradigm of Plugin-Oriented Programming; it is used to call metatools code.`pkginfo`

keyworded argument list, which contains info about the package we're generating (e.g. name, category, template directory, etc).

BTW, running an autogen script is as simple as running `doit`

in its directory. We'll be running `doit`

quite often to see if things are going as expected.

The first thing our autogen should do is to figure out what the latest version for Node is. Since we're already using GitHub to pull in the sources, the best way to figure this out is via the GitHub API.

For this approach to work, the upstream repo should either have tags or releases. In the case of Node, the upstream uses releases.

By looking at the GitHub API reference, we can figure out that the endpoint we're after is `/repos/{owner}/{repo}/releases`

. Let's try to fetch it in the autogen.

First things first, we need to know the owner and repository names. Let's store them in two variables:

```
async def generate(hub, **pkginfo):
github_user = "nodejs"
github_repo = "node"
```

Then, we need to build the URL and send a request. The metatools way of doing this is via the `hub.pkgtools.fetch.get_page()`

function:

```
releases = await hub.pkgtools.fetch.get_page(
f"https://api.github.com/repos/{github_user}/{github_repo}/releases",
is_json=True,
)
```

As you can tell, the only positional argument to `get_page`

is a URL. There are also various optional arguments, but the most useful one is `is_json`

which parses the response as JSON for you.

The endpoint is supposed to return a list of objects representing the releases. We can use this list to figure out what the latest release is.

While, intuitively, it might make sense for the first release in the response to be the latest one, it's not always the case, and it's better to be safe than sorry.

There might be draft releases or prereleases at the start of the list, and we definitely don't want to package those. In the case of Node, there might also be a LTS release created after the latest current release.

What we usually do is we make use of `packaging.version`

to figure out what the newest version is. Let's import it in our autogen:

```
from packaging import version
```

In addition to that, we also filter out all draft and prerelease releases.

There is also a theoretical case in which we can't find a suitable release to package at all. While it is practically unlikely, it's still a good idea to handle it just in case.

The magic words to do this in Python look like this:

```
try:
latest_release = max(
(
release
for release in releases
if not release["prerelease"] and not release["draft"]
),
key=lambda release: version.parse(release["tag_name"]),
)
except ValueError:
raise hub.pkgtools.ebuild.BreezyError(
f"Can't find suitable release of {github_repo}"
)
```

This fragment does exactly what we described above: it finds the latest release in terms of version, which isn't a draft or a prerelease. In the case in which `max`

finds nothing, an error is thrown, so we catch that and rethrow something with a more meaningful error message.

To verify it works, we can try logging `latest_release[tag_name]`

. As expected, the result is `v17.4.0`

(which is the latest at the time of writing this).

Now that we have the latest release, we can use it to retrieve the source tarball URL and the actual version we're packaging. As you may have noticed already, the version of a release is its `tag_name`

. As for the tarball, it's URL is the `tarball_url`

.

Also, notice that the tag name has a prefix of `v`

. Portage wouldn't really like that prefix, so we should strip it before generating an ebuild for it:

```
latest_version = release["tag_name"].lstrip("v")
latest_tarball = release["tarball_url"]
```

Now that we have the tarball, we need to also create an "artifact" for it. An artifact in metatools is a resource that is used by a `BreezyBuild`

, and ultimately referenced in an ebuild.

The artifacts are precisely the info metatools uses to figure out what to put in the Manifest. They are also what we use to set the `SRC_URI`

in the template.

Let's create an artifact by specifying the URL and final name for it:

```
tarball_artifact = hub.pkgtools.ebuild.Artifact(
url=latest_tarball, final_name=f"{pkginfo['name']}-{latest_version}.tar.gz"
)
```

All that's left is to create the `BreezyBuild`

and `push`

it. `BreezyBuild`

basically wraps the context for the ebuild generation. We need to pass the `pkginfo`

, the version, the artifacts, along with any other values we'd want to access in the template:

```
ebuild = hub.pkgtools.ebuild.BreezyBuild(
**pkginfo,
version=latest_version,
artifacts=artifacts,
)
ebuild.push()
```

This is basically the whole autogen script! All that's left is to tweak the template and test everything out. Let's begin by replacing the hardcoded `SRC_URI`

with the URI from the tarball artifact:

```
SRC_URI="{{ artifacts[0].src_uri }}"
```

It's as simple as accessing the first (and only) element in the `artifacts`

list's `src_uri`

property. And since we're not doing anything special in the rest of the template, this should theoretically be good enough. Let's give it a try!

Let's run the autogen and look at the generated ebuild. This is how it looks for me:

```
# Distributed under the terms of the GNU General Public License v2
EAPI=7
PYTHON_COMPAT=( python3+ )
inherit python-any-r1
DESCRIPTION="Node.js JavaScript runtime"
HOMEPAGE="https://nodejs.org"
SRC_URI="https://api.github.com/repos/nodejs/node/tarball/v17.4.0 -> nodejs-17.4.0.tar.gz"
LICENSE="Apache-1.1 Apache-2.0 BSD BSD-2 MIT"
SLOT="0"
KEYWORDS="*"
IUSE=""
DEPEND=""
RDEPEND="${DEPEND}"
BDEPEND="
${PYTHON_DEPS}
"
post_src_unpack() {
mv "${WORKDIR}"/node-"${PV}" "${S}" || die
}
src_configure() {
configure_options=(
# By default, prefix is /usr/local, which is outside of PATH,
# set it to /usr instead:
--prefix="${EPREFIX}"/usr
)
# NOTE: `econf` default flags appear to trip up the configure process,
# directly call the ./configure script instead.
./configure "${configure_options[@]}"
}
```

It's exactly like the ebuild we wrote manually, with the only exception being the `SRC_URI`

. We can also see that a Manifest was generated for us in the process. Neat!

To see if it works, let's `emerge`

it! And... it failed. Oh well, it's the source filenames again:

```
mv: cannot stat '/var/tmp/portage/net-libs/nodejs-17.4.0/work/node-17.4.0': No such file or directory
```

Turns out that the tarball we got from the release has a different structure than the one we used before. Let's investigate.

Peeking into `/var/tmp/portage/net-libs/nodejs-17.4.0/work`

, we find a directory called `nodejs-node-eeed0bd`

. The name of the directory is actually in the format `{github_user}-{github_repo}-{commit_sha}`

.

Conveniently, we already have the GitHub user and repo defined in the autogen script. And as I mentioned earlier, we can pass additional arguments to the `BreezyBuild`

if we need them for the template:

```
ebuild = hub.pkgtools.ebuild.BreezyBuild(
**pkginfo,
version=latest_version,
github_user=github_user,
github_repo=github_repo,
artifacts=[tarball_artifact],
)
```

Note that we don't have the commit SHA, but we don't really need it, since it's fine to just use a glob for it. Now, we can use these two in the template:

```
post_src_unpack() {
mv "${WORKDIR}"/{{ github_user }}-{{ github_repo }}-* "${S}" || die
}
```

Let's generate the ebuild again. Peeking at the ebuild, we see exactly what we expected in `post_src_unpack`

:

```
post_src_unpack() {
mv "${WORKDIR}"/nodejs-node-* "${S}" || die
}
```

This looks promising so far, so let's `emerge`

away! And it should be no surprise that it worked!

```
$ node
Welcome to Node.js v17.4.0.
Type ".help" for more information.
> console.log('Hello, world!')
Hello, world!
undefined
```

We now have a script that will always package the latest version of Node for us! If this sounds cool, in the future, we'll look into packaging multiple Node versions at once, precisely the LTS ones, which is even cooler in my opinion. Stay tuned!

As always, the final code can be found somewhere in the commit history of the nodejs-overlay repo.

]]>For those who are unfamiliar with the concept of autogens in Funtoo, they are practically our very own robot package maintainers. With the power of the metatools framework, we can write scripts, which automagically generate ebuilds for us.

Rewriting the ebuild from scratch isn't a prerequisite for writing an autogen, but since the Node.js ebuild is unfamiliar territory to me, I thought I might as well try and strip all the nonsense from it. It was also an opportunity to document how I personally approach packaging software from scratch.

This blog post captures the process of figuring out how Node.js is built and putting together a brand new (and hopefully easier to understand) ebuild for it.

I set up a quick Funtoo LXD container for testing since I don't feel like breaking the Node installation on my host. Now, it's time to set up a local overlay as per the docs. I created a fork of the Skeleton overlay repo and called it “nodejs-overlay”.

After setting up the development environment, it's time for the main dish. At the time of writing this, the latest LTS version of Node is 16.13.2, so it will be the one we're targeting. Also, all the URL references to the Node.js GitHub repo are for a specific commit, so this post should still make sense even if things change in the future.

Let's create a new file in the `nodejs-overlay/net-libs/nodejs`

directory, called `nodejs-16.13.2.ebuild`

and fill in the basic stuff:

```
# Distributed under the terms of the GNU General Public License v2
EAPI=7
DESCRIPTION="Node.js JavaScript runtime"
HOMEPAGE="https://nodejs.org"
SRC_URI="https://github.com/nodejs/node/archive/refs/tags/v16.13.2.tar.gz -> ${P}.tar.gz"
LICENSE="Apache-1.1 Apache-2.0 BSD BSD-2 MIT"
SLOT="0"
KEYWORDS="*"
IUSE=""
DEPEND=""
RDEPEND="${DEPEND}"
BDEPEND=""
```

Now, we can run `ebuild nodejs-16.13.2.ebuild digest`

to pull in the sources and generate a `Manifest`

file.

It's about time to look into how Node.js builds. This document in the repo appears to be a good starting point. Scrolling down to the “building on Unix” section, we can see that Node uses GNU make to build, and additionally requires Python 3 to be installed.

We don't really need to worry about GNU make being present, so we can go ahead and add a dependency on Python 3+ for now. We will do it by inheriting the python-any-r1 eclass:

```
inherit python-any-r1
```

Before the inherit line, we should also define `PYTHON_COMPAT`

:

```
PYTHON_COMPAT=( python3+ )
```

Both of these should go under the `EAPI`

line since eclasses usually require that to be set in advance.

Then, we can add `${PYTHON_DEPS}`

to `BDEPEND`

:

```
BDEPEND="
${PYTHON_DEPS}
"
```

The best way to tell if we've done anything meaningful is to simply try to emerge our newly created ebuild. While we definitely don't expect it to work, the error we get will give us a clue what to do next.

Doing exactly that, I immediately get hit with the following error:

```
* ERROR: net-libs/nodejs-16.13.2::nodejs-overlay failed (prepare phase):
* The source directory '/var/tmp/portage/net-libs/nodejs-16.13.2/work/nodejs-16.13.2' doesn't exist
```

Seems like our source directory is called something else. Let's investigate!

We can run `ebuild nodejs-16.13.2.ebuild clean unpack`

to wipe the temporary files for nodejs and unpack the source code. Then, we can look into `/var/tmp/portage/net-libs/nodejs-16.13.2/`

to see what things look like.

The source directory lives in the `work/`

subdirectory. Running `ls -la`

in there produces the following output:

```
total 0
drwx------ 1 portage portage 24 Jan 24 19:19 .
drwx------ 1 portage portage 146 Jan 24 19:19 ..
drwxr-xr-x 1 portage portage 820 Jan 10 20:02 node-16.13.2
```

Yep, indeed we need to tweak the source directory name. We can either override the `S`

(path to temporary build directory/source directory) variable in the ebuild, or we can move the directory to `S`

. In Funtoo, we tend to prefer the latter.

To achieve this, we can add `post_src_unpack`

to our ebuild and perform a move in it like so:

```
post_src_unpack() {
mv "${WORKDIR}"/node-"${PV}" "${S}" || die
}
```

Simple as that! If we try to emerge nodejs again, we get a tiny bit further:

```
* nodejs-16.13.2.tar.gz BLAKE2B SHA512 size ;-) ... [ ok ]
* Using python3.7 to build
>>> Unpacking source...
>>> Unpacking nodejs-16.13.2.tar.gz to /var/tmp/portage/net-libs/nodejs-16.13.2/work
>>> Source unpacked in /var/tmp/portage/net-libs/nodejs-16.13.2/work
>>> Preparing source in /var/tmp/portage/net-libs/nodejs-16.13.2/work/nodejs-16.13.2 ...
>>> Source prepared.
>>> Configuring source in /var/tmp/portage/net-libs/nodejs-16.13.2/work/nodejs-16.13.2 ...
* econf: updating nodejs-16.13.2/deps/cares/config.guess with /usr/share/gnuconfig/config.guess
* econf: updating nodejs-16.13.2/deps/cares/config.sub with /usr/share/gnuconfig/config.sub
./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --libdir=/usr/lib64
Node.js configure: Found Python 3.7.10...
gyp: --mandir=/usr/share/man not found (cwd: /var/tmp/portage/net-libs/nodejs-16.13.2/work/nodejs-16.13.2) while trying to load --mandir=/usr/share/man
Error running GYP
```

Seems like we hit the configure phase. This is where we need to tweak the options to get things to work. The issue in this case appears to be that `gyp`

is failing to find the `mandir`

.

This is one of the default flags that is set by `econf`

, and `gyp`

for some reason doesn't like it, so I suppose we're better off overriding `src_configure`

in our ebuild (we'll have to do it later anyway) and calling the configure script manually:

```
src_configure() {
./configure
}
```

Indeed, this gets us past the configuration phase and we're now in business! Node is compiling!

While we're waiting, we might as well go back and add a little comment to note why we chose not to use the `econf`

wrapper:

```
src_configure() {
# NOTE: `econf` default flags appear to trip up the configure process,
# directly call the `./configure` script instead.
./configure
}
```

And after a hefty compile, it indeed was this simple! We now have a functional Node.js installation, compiled from source. It is installed outside of `PATH`

though (at least on a standard Funtoo system): `/usr/local`

.

We can still verify that the freshly compiled binary works. Let's run it by specifying the full path:

```
$ /usr/local/bin/node
Welcome to Node.js v16.13.2.
Type ".help" for more information.
> console.log('Hello, world!')
Hello, world!
undefined
```

And indeed, it does! This is certainly good enough for this blog post. As a final touch, let's make it install in `/usr`

instead of `/usr/local`

. To do this, you usually need to specify some sort of `prefix`

while configuring. The way you pass this flag differs from build system to build system, so we'll need to check where Node expects it.

The logical place to start our investigation is the `configure`

script. It appears to just be checking our Python version and importing the `configure`

Python module if it's supported. Otherwise, it just throws some fancy error we don't care about.

So, the real configuration appears to be happening in `configure.py`

. Skimming through it, we can spot the `--prefix`

option here. It appears to be exactly what we're after!

Let's just override it in `src_configure`

then. While we can simply set it to `/usr`

and call it a day, it doesn't hurt to also prepend `EPREFIX`

to it. Since Portage can technically be running in a “prefix” (not targeting /), it's “good practice” to account for that in the ebuild.

This is how our `src_configure`

looks now:

```
src_configure() {
configure_options=(
# By default, prefix is /usr/local, which is outside of PATH,
# set it to /usr instead:
--prefix="${EPREFIX}"/usr
)
# NOTE: `econf` default flags appear to trip up the configure process,
# directly call the ./configure script instead.
./configure "${configure_options[@]}"
}
```

With that, we have a pretty satisfactory result. We'll look into autogenning Node.js and potentially extending this ebuild in the future.

BTW, the ebuild we wrote in this blog post can be found in this repo.

]]>