Not Fully Supported Post

Parts of this post is not fully translated to my new page. See the old page for the full experience: https://blog.erk.dev/posts/rrust/.

RRust: A Reversible Embedded Language

A happy Ferris crab with inverted colours

Introduction

In this post I plan to give an introduction to reversible programming languages, and some introduction to the programming language Rust. Then I will introduce my amalgamation of those two things, RRust. The overview I will give here is not going to be too deep, In the end, I will provide some articles which go deeper into it all. RRust was produced for my Master's Thesis, so the recounting here goes into some details which may already be known to Rust users. Finally, any comments or questions are welcome and can be sent to \<blog\@erk.dev\>.

Reversible Computing

Introduction

Reversible computing is the idea that you can for a given function f(x) := y you can generate the inverse function that takes the output and generates the input f^-1(y) := x. This of course could be done by hand, but we want to make a machine, here donoted by I, that can make the transformation of the code. That is that I(f) := f^-1. To generalize this a bit we can look at x and y as states and f as a function that goes from state x to state y, this is how we will look at it in the following chapters since we are going to model our functions as mutations to variables. So we want to generate the inverse function f^-1 that takes state y and mutates it to state x.

Which Functions Can Be Reversible?

A good question to ask here is if all functions can be reversible or if there are some limitations here.

So the first thing we can single out here is that we need the functions to be deterministic, that is that for any given input the function will always produce the same output, but we have to go a bit further than that here. We also need backward determinism, that is that any output can only be produced from a single input, which can also be seen as that the inverse function is deterministic.

From this, we can see that a given function will have to be bijective.

Bijective function from the set X to the set Y

There is a small caveat to this, we cannot always know before a computation if it will succeed, so we need to include a third possibility, in the form of an error. So any function is reversible if it is bijective or results in an error.

Historical Background

Reversible computing has long been a topic that has been worked on to some degree for a couple of reasons. Early [1] argued that a computer only had to dissipate heat when erasing information, since you have to keep all information to keep a given computation reversible this was given as an example of a future, more efficient system. Even if it had to do more actual computations to compute the same information.

More recently reversible programming languages have shown up in the field of quantum computers, again because of the need to keep all information at any given point in a computation. Though they are often a lot lower level than the languages I will present here, in theory these languages would be able to run on a powerful enough quantum computer.

Reversible Programming Languages

The reversible languages we will look at in this post are not the ones used for quantum computers, we will look at some that are higher level.

Janus

The first one I have here is Janus, it is made for general invertibility and I will go further into how it works in a later section.

Reversible Erlang

Reversible Erlang is a subset of Erlang that introduces the concept of rollbacks. By using that the subset is reversible it is possible to rollback to a checkpoint by running in reverse until you hit it. [2].

Hermes

Hermes is a small reversible programming language constructed to be used to implement encryption algorithms. It uses the reversibility and type system to defend against certain types of side-channel attacks. [3]

Janus

Janus is a reversible programming language first described at Caltech around 1982 and formalized in 2007. [4], [5]

The main reversible language I will focus on is Janus, this is because I am using the general idea of that to implement the reversibility in my language.

We start with a small example of Janus, just to give a bit of taste of the syntax:

procedure copy(int arr[], int payload[])
    local int i = 0          // Create a local variable
    from i = 0 loop          // Loop from i == 0
        arr[i] += payload[i] // Add the value
        i += 1               // Increment counter
    until i = 2048           // until i == 2048
    delocal i = 2048         // Uninitialize local variable

How does Janus ensure reversibility?

Janus uses specialized structures to ensure reversibility the first we will look at the constructs it uses for control flow. Janus has two types of control-flow: Conditionals and for-until loops. Furthermore, it uses the syntax to ensure reversibility, so we will look at that as well.

Conditionals

To try to understand how conditionals work we will look at a small example and try to explain it. So we want to look at the following example:

if c₁ then
   s₁
else
   s₂
fi c₂

So what does this mean, we can look at it as c₁ as a pre-condition and c₂ as a post-condition. So if c₁ is true then s₁ is run and then c₂ must be true, and conversely, if c₁ is false then s₂ is run and c₂ must be false afterward. Another way we can look at it is by looking at a flow-chart:

A visual description of conditions in Janus

In the flow-chart representation, we can see what we described above. Here we can also see how the inverse is going to look, we can swap c₁ with c₂, and then get the inverse of s₁ and s₂.

Loops

The loops work similarly to the conditionals, though the conditions are slightly different compared to them. We again look at an example:

from c₁
  do s₁
  loop s₂
until c₂

This construct will probably seem a bit more alien to programmers, it is not clear what do and loop does in this context. To reverse the loop we swap c₁ and c₂ then we take the inverse of s₁ and s₂:

from c₂
  do I[s₁]
  loop I[s₂]
until c₁

Now we have an overview of how both directions will look, so we can try and follow the flow of it. For the forward we start with c₁ being true, and then it must be false every time following that until c₂ is true and the loop is terminated. The reason for this is that when we reverse the loop it will terminate too early if c₁ is ever true before what is the end of it when going forward.

We can also try and understand it by looking at a flow-chart as before:

A visual description of loops in Janus

The reason for the last part here is to ensure reversibility. To create the inverse version of this loop we swap c₁ and c₂, then take the inverse of s₁ and s₂.

A visual description of loops in Janus

Then we can see that we start with c₂ being true only once similarly to when it was the exit condition. And c₁ is true only when the loop is finished.

Syntax

The Janus syntax is pretty barebones and not too complicated, so we will go over it here. It should be noted that the Janus version that we target has a few more features notable function arguments and locals, but we will come to them when we present RRust.

The syntax of Janus
Text version of the Janus syntax
# Syntax Domains:
p ∈ Progs[Janus]
x ∈ Vars[Janus]
c ∈ Cons[Janus]
id ∈ Idens[Janus]
s ∈ Stmts[Janus]
e ∈ Exps[Janus]
⨁ ∈ ModOps[Janus]
⨀ ∈ Ops[Janus]

# Grammar
p ::= d* (procedure id s)⁺
d ::= x | x[c]
s ::= x ⨁= e | x[e] ⨁= e |
      if e then s else s fi e |
      from e do s loop s until e |
      call id | uncall id | skip | s s
e ::= c | x | x[e] | e ⨀ e
c ::= 0 | 1 | ... | 4294967295
⨁ ::= + | - | ^
⨀ ::= ⨁ | * | / | % | */ | & | '|' | && | "||" |
      < | > | = | != | <= | >=

The most interesting part of the syntax for me is the statements s, the first two forms it can take both cause mutation of a variable or an array. We can see that the only methods that are allowed to cause mutations are ⊕=, which again expands into +=, -= and ^=. These are the only operators that are allowed because they are the only ones that can be generally reversed. For expressions, all operators are allowed this is because they are not inversed when reversing the procedure. In s we also have call and uncall which allows us to both call a function and call the reverse of that function. Another interesting thing to note here is that we only allow the use of integers, this is because floating point numbers do not have reversible semantics.

Rust

I will not dive too deep into Rust or its syntax in this section as there already exists good walkthroughs of the language which cover it well. Though I will get into the main parts of the language that enables its safety and explain them briefly. If you are an intermediate user of Rust you can probably skip or skim this section.

What is Rust?

The official site explains rust with this short motto:

> A language empowering everyone > to build reliable and efficient software.

Although this does not say too much about the language. Rust is a programming language first designed by Graydon Hoare and later incubated at Mozilla Research to be used in their products. The main feature of the language is to be safer than other languages in the same class such as C and C++. Today it is used widely in loads of companies such as Amazon and Discord.

Why Rust?

So why should a company use Rust instead of the tested and stable languages like C and C++? One of the major reasons to use Rust is that it can rule out a class of issues related to memory safety, which is the cause of a lot of exploits. Microsoft for example has said that around 70% of their vulnerabilities are some form of memory corruption [6]. Another reason to move to Rust is that it is a more "modern" language compared to C and C++ it has a bunch of features from functional languages such as pattern matching. It also has a more strict type system which again allows one to catch issues earlier. Furthermore, it has an integrated package manager and build system that makes it easier to pull in dependencies.

How does Rust ensure memory-safety?

Rust uses a few different constructs to ensure safety here we will go through the most prominent.

Ownership

Ownership is a concept that is central in Rust, it can be seen as a variable that every value has, and it follows a few rules:

So what do these rules tell us? Firstly every value has some /owner/ which must be unique for every value, and when it leaves a scope it is dropped. Dropped here means that a possible deconstructor is run, and it is deallocated.

Let us have a look at a small example of ownership.

let v = vec![1, 2, 3];
{              // start a new scope
    let w = v; // move v into the scope
}              // w goes out of scope and is dropped

v[0]           // Error

So in this example, we create a new value v which here is a Vec, it is a non-copy[^1] type, so ownership is transferred into the new scope. There it is assigned to w which means that w now is the owner of the value. It is then dropped after it goes out of scope. Attempting to compile this code will end out with a compiler error:

error[E0382]: borrow of moved value: `v`
 --> src/main.rs:7:1
  |
2 |     let v = vec![1, 2, 3];
  |         - move occurs because `v` has type `Vec<i32>`, which does not implement the `Copy` trait
3 | { // start a new scope
4 |     let w = v; // move v into the scope
  |             - value moved here
...
7 | v[0] // Error
  | ^ value borrowed here after move

It is important that the error here happens at compile time as otherwise, this could cause a use-after-free memory corruption.

Borrowing

To use the value of a non-copy type in a new scope without moving it you can use borrowing:

let v = vec![1, 2, 3];
{               // start a new scope
    let w = &v; // borrow v in the scope
}               // drop the reference to v

v[0] // No error

Here instead of moving the whole value into the new scope you only move a reference. A reference is similar to a pointer from other languages though it gives better guarantees at compile-time.

There are two types of borrows in Rust:

There are a few rules related to those, immutable references may exist in any number as long as there are no mutable references. Mutable reference must be the only borrow of a value, no other borrows may exist at the same time.

To ensure that these rules are upheld Rust uses a system called the Borrow Checker. It will construct a set of regions called lifetimes in which a borrow is valid, it can then check if the rules are upheld there.

For example, if we try to use both a mutable and an immutable borrow at the same time we will run into an error:

let mut v0 = Vec::new();
v0.push(1);
let b0 = &v0[0];
let b1 = &mut v0[0];
println!("{}, {}", b0, b1);

Here both b0 and b1 are used at the same time leading to the following error:

error[E0502]: cannot borrow `v0` as mutable because it is also borrowed as immutable
 --> src/lib.rs:4:15
  |
3 | let b0 = &v0[0];
  |           -- immutable borrow occurs here
4 | let b1 = &mut v0[0];
  |               ^^ mutable borrow occurs here
5 | println!("{}, {}", b0, b1);
  |                    -- immutable borrow later used here

If either of b0 or b1 is removed from line 5 it will result in a successful compilation since the borrow checker can see that none of the borrowing rules are breached.

Unsafe

In any case, where there is a possibility of breaking memory safety it is necessary to use the unsafe keyword which allows one to use functions and methods that are marked in that way. For example, it is unsafe to read from a raw pointer.

Bound checking

The last thing I will highlight that Rust does to ensure safety is that array indexing is bound checked and will cause a run-time panic if it fails, thus aborting the program[^2].

Meta-Programming

The last feature of Rust I will highlight here does not have much to do with safety, it instead concerns the powerful meta-programming Rust allows. Rust has two major types of meta-programming:

macro_rules!

Procedural Macros

RRust

So now we have looked at reversible computing and the language Janus. We also had a detour into Rust and a few features for how it ensures safety. Now I will give some reasoning as to why I made RRust and how it solves issues with Janus, then I will finally present RRust.

Issues With The Safety Of Janus

Janus has two modes of evaluation a safe interpreter that runs the code and a translation into C++. The translation into C++ gives us some issues when it comes to memory safety. If we for example have a function with the following signature:

procedure copy(int arr[], int payload[])

The C++ will generate the following functions:

void copy_forward(int *arr, int *payload);
void copy_reverse(int *arr, int *payload);

Here we see a few issues, C and C++ allow for mutable aliasing, so there is nothing to stop arr and payload from referencing the same memory, either in full or partially. In both cases, it could cause operations that damage the reversibility of the program. The other main issue is that it allows out-of-bound writing since it uses direct array access.

The Idea

The idea I have is to use the safety features of Rust to ensure that it is memory-safe. Rust checks for out-of-bound access and mutable aliasing is not possible[^3].

To implement the language I will use the meta-programming features of Rust to transform the code into a reversible version.

How could a translation look?

We will start by looking at the example from before:

procedure copy(int arr[], int payload[])
    local int i = 0
    from i = 0 loop
        arr[i] += payload[i]
        i += 1
    until i = 2048
    delocal int i = 2048

This function copies by addition from payload into arr. With the Rust version, we want to make it similar to how Janus translates it. We use a zero-sized type as a kind of namespace for the functions instead of using a prefix:

struct Copy; // Zero Sized Type
impl Copy {
    fn forward(arr: &mut [i32], payload: &mut [i32]) { /* forward code */ }
    fn backwards(arr: &mut [i32], payload: &mut [i32]) { /* backwards code */ }
}

To have it only mutate through a mutable reference instead of taking ownership and returning a value simplifies the code. This also means that it is not possible to alias arr with payload since it would in that case not compile.

I will also show how the forward and backwards code could look, it is very similar to how the same program would look in C++, The biggest difference is that arrays are bound checked[^4]:

struct Copy;
impl Copy {
    fn forward(arr: &mut [i32], payload: &mut [i32]) {
        let mut i = 0;
        assert!(i == 0);
        while !(i == 2048) {
            arr[i] += payload[i];
            i += 1;
            assert!(!(i == 0));
        }
        if i != 2048 {
            panic!("Delocal failed {} != {}", i, 2048);
        }
        drop(i);
    }
    fn backwards(arr: &mut [i32], payload: &mut [i32]) {
        let mut i = 2048;
        assert!(i == 2048);
        while !(i == 0) {
            i -= 1;
            arr[i] -= payload[i];
            assert!(!(i == 2048));
        }
        if i != 0 {
            panic!("Delocal failed {} != {}", i, 0);
        }
        drop(i);
    }
}

How would this program look in RRust?

Now we have a good idea about how the finished program should look, but the bigger question now is how it should look before being transformed into that. We can base the structure upon Janus with a flavor of Rust, so here I present the example from above in RRust:

rfn!(Copy, (arr: &mut [i32], payload: &mut [i32]), {
    let mut i = 0;
    rloop!(i == 0,
           {
               arr[i] += payload[i];
               i += 1;
           },
           i == 2048);
    delocal!(i, 2048);
});

We use a series of macros to construct each part of it, rfn! creates the structure of the code, rloop! creates a reversible loop with a pre-condition and a post-condition and delocal! creates checks if the value is correct and makes it impossible to use the local value afterward.

To ensure that the program was reversible I made a secondary transpiler that compiled the Rust code into Janus which could then also be checked.

Defining a syntax

I created a syntax of the part of Rust I deemed to be compatible with Janus since I was translating to that as well:

The syntax of RRust
Text version of the RRust syntax
program ::= rfn!( name, (args), body);
args ::= arg | args, arg
arg ::= id: &mut type
list ::= [ scalar ]
scalar ::= i8 | i16 | i32 | i64 | u8 | u16 | u32 | u64 | isize | usize
number ::= scalar :: MIN | scalar :: MIN + 1 | ... | 
           scalar :: MAX -1 | scalar :: MAX
name ::= [A-Z|a-z][A-Z|a-z|0-9|_]*
body ::= { stmt }
stmt ::= stmt ; | stmt ; stmt | { stmt } | * stmt | let id = expr |
         id ⊕= expr | l[expr] ⊕= expr |
         rif | rloop | delocal | name::forward(fargs) |
         name::backwards(fargs) | swap(id, id)
expr ::= number | *expr | l[expr] | expr ⊙ expr
fargs ::= farg | fargs, farg
farg ::= expr | id
⊕ ::= + | - | ^
⊙ ::= ⊕ | * | / | & | '|'
⊗ ::= == | >= | <= | < | > | !=
rif ::= rif!(bool, body, bool) | rif!(bool, body, body, bool)
rloop ::= rloop!(bool,body,bool) | rloop!(bool,body,body,bool)
delocal ::= delocal!(id, expr)
bool ::= expr ⊗ expr | true | false

Here the most important parts to observe are the two statements that allow mutation both uses the ⊕= that we saw with Janus as well. As well as only allowing integer types for the same reason as Janus.

Transformation

So to facilitate the transformation of the code we use a few different features of the way that the language is set up. Local Inversibility, local invertibility here means that any syntax segment can be inversed in isolation, and there is no need for full program analysis to figure out how to translate a part, this also makes the second feature fit right in. I use recursive decent to go through the program and transform it. So in each part we do the local inversion, and then we recursively invert every part of that statement.

These two parts make it possible to transform the code. The only thing missing now is the rules we apply to perform these transformations. I want to define two sets of transformations one to Rust that follows the semantics of the translation into C++ and one that translates into Janus.

RRust to Rust

In this section, I will present the transformation rules that go from RRust into Rust. I will be using F[stmt] for forward translation rules and R[stmt] for reverse translation.

Reversible If

The first construct we will look at is reversible if statements, as with Janus, they have a pre and a post condition, here the pre-condition is c₁ and the post-condition is c₂.

rif!(c₁, b₁, b₂, c₂);

This then gets expanded into the following Rust code:

if c₁ {
  F[b₁]
  assert!(c₂);
} else {
  F[b₂]
  assert!(!(c₂));
}

We can see here that it is as with the graph we looked at earlier, if c₁ then c₂ and if !c₁ then !c₂.

This was the forward translation, the reverse translation is similar, but we exchange c₁ and c₂ and recursivly reverse:

if c₂ {
  R[b₁]
  assert!(c₁);
} else {
  R[b₂]
  assert!(!(c₁));
}

Here we can see that the same rules applies as well, e.g. if c₂ then c₁.

Reversible loop

The reversible loop again works as the one in Janus, it has a condition that is only true when entering and a post-condition that similarily is only when exiting the loop.

rloop!(c₁, b₁, b₂, c₂);

This expands in the forward direction to:

assert!(c₁);
F[b₁];
while !(c₂) {
  F[b₂];
  assert!(!(c₁));
  F[b₁];
}

Here we have the form of loop we saw on the graph earlier, c₁ must be true entering and c₂ must be true when exiting. Again the reverse version is very similar. We only swap c₁ and c₂ around and recursively reverse.

assert!(c₂);
R[b₁];
while !(c₁) {
  R[b₂];
  assert!(!(c₂));
  R[b₁];
}

Other constructs

Now we have looked at the biggest constructs, now we look at all the smaller ones and try to put a few words to each of why they are constructed in that way. We only need to go through the reverse translation as the forward translation has no changes. I will start with arguable one of the most important ones:

Operators

The 3 operators we skipped over are not translated in any complicated way, + becomes -, and the reverse is the same. For xor ^ it is similar to swap since it is self-inverse, so it does not change.

Aliasing

The transformations done in the previous sections is enough to write reversible programs, but it does not remove the possibility to write programs that will not be reversible. To ensure this we need to insert checks for aliasing as we discussed in the Janus section. The Janus-generated C++ code will not check for it at any point, so it is possible to give inputs that will cause issues. Rust ensures that we cannot give aliased arguments, but we also need mutation operations not to be aliased. The specific forms that we need to ensure are the following:

// Also the case for +, - and ^.

a += b; // where a == b

l[x] += l[y]; // where x == y

To ensure this we insert a check if they point to the same value:

if core::ptr::eq(&(e₁), &(e₂)) {
    panic!();
}
e₁ += e₂;

Here the references will be implicit cast to pointers which we then can check for equality, and fail if they are equal.

RRust to Janus

As said further up, I also implemented a transformation of RRust into Janus. The transformation was rather mechanical and was mostly done by removing all the extra notation Rust uses. The most complicated part of that was the parser needed for the macros. I am not going to fully present it here, but if you are interested in seeing more about it feel free to contact me about it.

Example RRust programs

Here are a few examples of RRust code, they are reworked examples of Janus code.

rfn!(Fib, (x1: &mut i32, x2: &mut i32, n: &mut i32), {
    rif!(
        *n == 0,
        {
            *x1 += 1;
            *x2 += 1;
        },
        {
            *n -= 1;
            Fib::forward(x1, x2, n);
            *x1 += *x2;
            std::mem::swap(x1, x2);
        },
        *x1 == *x2
    );
});

let mut x1 = 0;
let mut x2 = 0;
let mut n = 10;

Fib::forward(&mut x1, &mut x2, &mut n);

assert_eq!(x1, 89);
assert_eq!(x2, 144);
assert_eq!(n, 0);

Fib::backwards(&mut x1, &mut x2, &mut n);

assert_eq!(x1, 0);
assert_eq!(x2, 0);
assert_eq!(n, 10);
{{< /code >}}

{{< code 
  language="rust" 
  title="Prime Factorization" 
  id="2" 
  expand="Show" 
  collapse="Hide" 
  isCollapsed="true" 
>}}
rfn!(Factor, (num: &mut usize, fact: &mut [usize; 20]), {
    let mut tryf = 0;
    let mut i = 0;
    rloop!(
        tryf == 0 && *num > 1,
        {
            NextTry::forward(&mut tryf);
            rloop!(
                fact[i] != tryf,
                {
                    i += 1;
                    fact[i] += tryf;
                    let mut z = *num / tryf;
                    std::mem::swap(&mut z, num);
                    delocal!(z, *num * tryf);
                },
                *num % tryf != 0
            );
        },
        tryf * tryf > *num
    );

    rif!(
        *num != 1,
        {
            i += 1;
            fact[i] ^= *num;
            *num ^= fact[i];
            fact[i] ^= *num;
        },
        {
            *num -= 1;
        },
        fact[i] != fact[i - 1]
    );

    rif!(
        (fact[i - 1] * fact[i - 1]) < fact[i],
        {
            rloop!(
                tryf * tryf > fact[i],
                {
                    NextTry::backwards(&mut tryf);
                },
                tryf == 0
            );
        },
        {
            tryf -= fact[i - 1];
        },
        (fact[i - 1] * fact[i - 1]) < fact[i]
    );

    ZeroI::forward(&mut i, fact);
    delocal!(i, 0);
    delocal!(tryf, 0);
});

rfn!(ZeroI, (i: &mut usize, fact: &mut [usize; 20]), {
    rloop!(
        fact[*i + 1] == 0,
        {
            *i -= 1;
        },
        *i == 0
    );
});

rfn!(NextTry, (tryf: &mut usize), {
    *tryf += 2;
    rif!(
        *tryf == 4,
        {
            *tryf -= 1;
        },
        *tryf == 3
    );
});

let mut num = 840;
let mut fact = [0; 20];

Factor::forward(&mut num, &mut fact);
print!("Num: {}, Factors: ", num);
for i in 1u64..=6 {
    print!("{}: {}", i, fact[i as usize]);
    if i != 6 {
        print!(", ");
    } else {
        println!(".");
    }
}

// Num: 0, Factors: 1: 2, 2: 2, 3: 2, 4: 3, 5: 5, 6: 7.

Factor::backwards(&mut num, &mut fact);
print!("Num: {}, Factors: ", num);
for i in 1u64..=6 {
    print!("{}: {}", i, fact[i as usize]);
    if i != 6 {
        print!(", ");
    } else {
        println!(".");
    }
}

// Num: 840, Factors: 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0.

Conclusion

So what did I show with this whole blog? And did it reach the goals I had set for myself?

I have shown that it is possible to use the meta-programming of Rust to construct a safe reversible programming language, or at least one safer than the code generated by the language Janus. Following that I have shown that it is possible to use meta-programming in Rust for more advanced program transformations.

But some parts that I wanted to have shown are still missing, I believe it is possible to use the type system of Rust to ensure reversibility better than what I have done, specifically the need to ensure that a type is not Copy since that will break the rules. I have tried various things, but I never found a way that did not get in the way when writing code or was in some sense trivial to misuse.

Looking forward

I have put the code up on GitHub at \<https://github.com/erk-/rrust\>, and I intend to work on it when I have some time. I have a few things that I want to have done with it including finding a way to make better bounds with the help of the type system. Any contributions are appreciated.

The main things that I want to do looking forward are, in no particular order:

Bibliography

Currently does not work, see note at the top of the page.

Footnotes

  1. Copy types include small types such as integers that efficiently can be copied. Rust will implicitly copy the value into a new, therefore it does not transfer ownership. <https://doc.rust-lang.org/std/marker/trait.Copy.html>
  2. It is possible to opt out of such bound-checks with the use of unsafe and specifically the get_unchecked function.
  3. Without the use of unsafe.
  4. Note that there is an important part missing here, I will get back to that later.