Introducing doop

As great as Immutable.js is, especially with a TypeScript declaration included in the package, the Record class leaves me a little disappointed.

In an ordinary class with public properties we’re used to being able to say:

const a = new Animal();
a.hasTail = true;
a.legs = 2;

const tailPrefix = a.hasTail ? "a" : "no";
const desciption = `Has ${a.legs} legs and ${tail} tail.`;

That is, each property is a single named feature that can be used to set and get its value. But immutability makes things a little more complicated, because rather than changing a property of an object, we instead create a whole new object that has the same values on all its properties except for the one we want to change. It’s just a convenient version of “clone and update”. This is how it has to be with immutable data. You can’t change an object, but you can easily make a new object that is modified to your requirements.

Why is this hard to achieve in a statically typed way? This thread gives a nice quick background. In a nutshell, you use TypeScript because you want to statically declare the structure of your data. Immutable.js provides a class called Record that lets you define class-like data types, but at runtime rather than compile time. You can overlay TypeScript interface declarations onto them at compile time, but it’s a bit messy. Inheritance is troublesome, and there’s a stubborn set method that takes the property name as a string, so there’s nothing stopping you at compile-time from specifying the wrong property name or the wrong type of value.

The most complex suggestion in that thread is to use code generation to automatically generate a complete statically typed immutable class, from a simpler declaration in a TS-like syntax. This is certainly an option, but seems like a defeat for something so fundamental to programming as declaring the data structures we’re going to use in memory.

Really this kind of class declaration should be second nature. If we’re going to adopt immutable data as an approach, we’re going to be flinging these things around like there’s no tomorrow.

So I wanted to see if something simpler could be done using the built-in metaprogramming capabilities in TypeScript, namely decorators. And it can! And it’s not as ugly as it might be! And there’s a nice hack hiding under it!

How it looks

This is how to declare an immutable class with some properties and one method that reads the properties.

import { doop } from "../doop";

@doop
class Animal {

    @doop
    get hasTail() { return doop<boolean, this>() }

    @doop
    get legs() { return doop<number, this>(); }

    @doop
    get food() { return doop<string, this>(); }

    constructor() {
        this.hasTail(true).legs(2);
    }

    describe() {
        const tail = this.hasTail() ? "a" : "no";
        return `Has ${this.legs()} legs, ${tail} tail and likes to eat ${this.food()}.`;
    }
}

The library doop exposes a single named feature, doop, and you can see it being used in three ways in the above snippet:

  • As a class decorator, right above the Animal class: this allows it to “finish off” the class definition when the code is loaded into the JS engine.
  • As a property decorator, above each property: this inserts a function that implements both get and set functionality.
  • As a helper function, called inside each property getter

Although not visible in that snippet, there’s also a generic interface, Doop, returned by the helper function, and hence supported by each property:

interface Doop<V, O> {
    (): V;
    (newValue: V): O;
}

That’s a function object with two overloads. So to get the value of a property (as you can see happening in the describe method) you call it like a function with no arguments:

if (a.hasTail()) { ...

It’s a little annoying that you can’t just say:

if (a.hasTail) { ...

But that would rule out being able to “set” (make a modified clone) through the same named feature on the object. If the type of hasTail were boolean, we’d be stuck.

There’s a particular pattern you follow to create a property in an immutable class. You have to define it as a getter function (using the get prefix), and return the result of calling doop as a helper function, which is where you get to specify the type of the property. Note: you only need to define a getter; doop provides getting and pseudo-setting (i.e. cloning) via the same property, with static type checking.

See how the constructor is able to call its properties to supply them with initial values. This looks a lot like mutation, doesn’t it? Well, it is. But it’s okay because we’re in the constructor. doop won’t let this happen on properties of a class that has finished being constructed and therefore is in danger of being seen to change (NB. you can leak a reference to your unfinished object out of your constructor by passing this as an argument to some outside function… so don’t do that).

And in the describe method (which is just here as an example, not part of any mandatory pattern) you can see how we retrieve the values by calling properties as if they were methods, this time passing no parameters.

But what’s not demonstrated in this example is “setting” a value in an already-constructed object. It looks like this:

const a = new Animal();
expect(a.legs()).toEqual(2); // jasmine spec-style assertion

// create a modified clone
const b = a.legs(4);
expect(b.legs()).toEqual(4);

// original object is unaffected
expect(a.legs()).toEqual(2);

Inheritance is supported; a derived class can add more properties, and in its constructor (after calling super() it can mutate the base class’s properties. The runtime performance of a derived class should be identical to that of an equivalent class that declares all the properties itself.

One thing to be wary of is adding ordinary instance properties to a doop class. It would be difficult to effectively block this happening, and in any case there may occasionally be a good reason to do it, as long as you understand one basic limitation of it: ordinary instance properties belong to an instance. When you call a property to set its value, you are returned a new instance, and there is no magic that automatically copies or initialises any instance fields. Only the other doop properties will have the same values as in the original instance. Any plain instance fields in the new instance will have the value undefined.

For simplicity’s sake, just make sure in a doop class that all data is stored in doop properties.

Implementation

The implementation of the cloning is basically the one described here so it’s super-fast.

I mentioned there’s a hack involved, and it’s this: I needed a way to generate, from a single declaration in the library user’s code, something that can perform two completely different operations: a simple get and a pseudo-set that returns a new instance. That means I need each property to be an object with two functions. But if I do that literally, then a get would look like this:

// A bit ugly
const v = a.legs.get();
const a2 = a.legs.set(4);

I don’t like the verbosity, for starters. But there’s a worse problem caused by legs being an extra object in the middle. Think about how this works in JS. Inside the get function this would point to legs, which is just some helper object stored in a property defined on the prototype used by all instances of the Animal class. It’s not associated with an instance. It doesn’t know what instance we’re trying to get a value from. I could fix this by creating a duplicate legs object as an instance property on every Animal instance, and then giving it a back-reference to the owning Animal, but that would entirely defeat the whole point of the really fast implementation, which uses a secret array so it can be rapidly cloned, whereas copying object properties is very much slower.

Or I could make legs, as a property getter, allocate a new middle object on the fly and pass through the this reference. So every time you so much as looked at a property, you’d be allocating an object that needs to be garbage collected. Modern GCs are amazing, but still, let’s not invent work for them.

So what if instead of properties, I made the user declare a function with two overloads for getting and setting? That solves the this problem, but greatly increases the boilerplate code overhead. The user would actually have to write two declarations for the overloads (stating the “property” type twice) and a third for the implementation:

// Ugh
@doop
legs(): number;
legs(v: number): this;
legs(v?: number): any { }

The function body would be empty because the doop decorator replaces it with a working version. But it’s just a big splurge of noise so it’s not good enough. And yet it’s the best usage syntax available. Ho hum.

Lateral thinking to the rescue: in TypeScript we can declare the interface to a function with two overloads. Here it is again:

export interface Doop<V, O> {
    (): V;
    (newValue: V): O;
}

Note that O is the type of the object that owns the property, as that’s what the “setter” overload has to return a new instance of.

Using a getter in the actual doop library looks like this:

const l: number = a.legs();

There are at least two possible interpretations of a.legs():

  • legs is function that returns the number we want.
  • legs is a property backed by a getter function, that returns a function with at least one overload (): number, which when called returns the number we want.

To explain the second one more carefully: the part that says a.legs will actually call the getter function, which returns a second function, so a.legs() would actually make two calls. The returned function would need to be created on-the-fly so it has access to the relevant this, so this is very much like the GC-heavy option I described earlier.

But it’s not possible to tell which it is from the syntax. And that’s quite good. Because if we tell the TypeScript compiler that we’re declaring a getter function that returns a function, it will generate JavaScript such as a.legs(). But at runtime, we can use the simple implementation where legs is just a function. The doop decorator can make that switcheroo, and we get the best of both worlds: a one-liner declaration of a property getter, and a minimal overhead implementation.

Well, it seemed nifty to me when I realised it would work!

So this is what the doop property decorator does: the user has declared a property, and all we care about is its name. All properties are the same at runtime: just a function that can be called to either get or clone-mutate.

doop on GitHub




Not yet regretting the time you've spent here?

Keep reading:

  • Vectors - Intuitions
  • Poorly Structured Notes on AI Part 4
  • Poorly Structured Notes on AI Part 3
  • Poorly Structured Notes on AI Part 2
  • Poorly Structured Notes on AI Part 1