Serialising complex objects in Javascript - an introduction to tanagra.js
Originally posted on the Toptal blog.
Modern websites typically retrieve data from a number of different locations, including databases and third-party APIs. For example, when authenticating a user, a website might look up the user record from the database, then embellish it with data from some external services via API calls. Minimising expensive calls to these data sources, such as disk access for database queries and internet roundtrips for API calls, is essential to maintaining a fast, responsive site. Data caching is a common optimisation technique used to achieve this.
Processes store their working data in memory. If a web server runs in a single process (such as Node.js/Express), then this data can easily be cached using a memory cache running in the same process. However, load-balanced web servers span multiple processes, and even when working with a single process, you might want the cache to persist when the server is restarted. This necessitates an out-of-process caching solution such as Redis, which means the data needs to be serialised somehow, and deserialised when read from the cache.
Serialisation and deserialisation are relatively straightforward to achieve in statically typed languages such as C#. However, the dynamic nature of Javascript makes the problem a little trickier. While ECMAScript 6 (ES6) introduced classes, the fields on these classes (and their types) aren’t defined until they are initialised—which may not be when the class is instantiated—and the return types of fields and functions aren’t defined at all in the schema. What’s more, the structure of the class can easily be changed at runtime—fields can be added or removed, types can be changed, etc. While this is possible using reflection in C#, reflection represents the “dark arts” of that language, and developers expect it to break functionality.
I was presented with this problem at work a few years ago when working on the Toptal core team. We were building an agile dashboard for our teams, which needed to be fast, otherwise, developers and product owners wouldn’t use it. We pulled data from a number of sources: our work-tracking system, our project management tool, and a database. The site was built in Node.js/Express, and we had a memory cache to minimise calls to these data sources. However, our rapid, iterative development process meant we deployed (and therefore restarted) several times a day, invalidating the cache and thereby losing many of its benefits.
An obvious solution was an out-of-process cache such as Redis. However, after some research, I found that no good serialisation library existed for JavaScript. The built-in JSON.stringify/JSON.parse methods return data of the object type, losing any functions on the prototypes of the original classes. This meant the deserialised objects couldn’t simply be used “in-place” within our application, which would therefore require considerable refactoring to work with an alternative design.
Requirements for the Library
In order to support serialisation and deserialisation of arbitrary data in Javascript, with the deserialised representations and originals usable interchangeably, we needed a serialisation library with the following properties:
- The deserialised representations must have the same prototype (functions, getters, setters) as the original objects
- The library should support nested complexity types (including arrays and maps), with the prototypes of the nested objects set correctly
- It should be possible to serialise and deserialise the same objects multiple times—the process should be idempotent
- The serialisation format should be easily transmittable over TCP and storable using Redis or a similar service
- Minimal code changes should be required to mark a class as serialisable
- The library routines should be fast.
- Ideally, there should be some way to support deserialisation of old versions of a class, through some sort of mapping/versioning
Implementation
To plug this gap, I decided to write Tanagra.js, a general-purpose serialisation library for Javascript. The name of the library is a reference to one of my favorite episodes of Star Trek: The Next Generation, where the crew of the Enterprise must learn to communicate with a mysterious alien race whose language is unintelligible. This serialisation library supports common data formats to avoid such problems.
Tanagra.js is designed to be simple and lightweight, and it currently supports Node.js (it hasn’t been tested in-browser, but in theory, it should work) and ES6 classes (including Maps). The main implementation supports JSON, and an experimental version supports Google Protocol Buffers. The library requires only standard Javascript (currently tested with ES6 and Node.js), with no dependency on experimental features, Babel transpiling, or TypeScript.
Serialisable classes are marked as such with a method call when the class is exported:
module.exports = serializable(Foo, myUniqueSerialisationKey)
The method returns a proxy to the class, which intercepts the constructor and injects a unique identifier. (If not specified, this defaults to the class name.) This key is serialised with the rest of the data, and the class also exposes it as a static field. If the class contains any nested types (i.e., members with types that need serialising), they are also specified in the method-call:
module.exports = serializable(Foo, [Bar, Baz], myUniqueSerialisationKey)
(Nested types for previous versions of the class can also be specified in a similar way, so that, for example, if you serialise a Foo1, it can be deserialised into a Foo2.)
During serialisation, the library recursively builds up a global map of keys to classes, and uses this during deserialisation. (Remember, the key is serialised with the rest of the data.) In order to know the type of the “top-level” class, the library requires that this be specified in the deserialisation call:
const foo = decodeEntity(serializedFoo, Foo)
An experimental auto-mapping library walks the module tree and generates the mappings from the class names, but this only works for uniquely named classes.
Project Layout
The project is divided into a number of modules:
- tanagra-core - common functionality required by the different serialisation formats, including the function for marking classes as serializable
- tanagra-json - serialises the data into JSON format
- tanagra-protobuf - serialises the data into Google protobuffers format (experimental)
- tanagra-protobuf-redis-cache - a helper library for storing serialised protobufs in Redis
- tanagra-auto-mapper - walks the module tree in Node.js to build up a map of classes, meaning the user doesn’t have to specify the type to deserialise to (experimental).
Note that the library uses US spelling.
Example Usage
The following example declares a serialisable class and uses the tanagra-json module to serialise/deserialise it:
const serializable = require('tanagra-core').serializable
class Foo {
constructor(bar, baz1, baz2, fooBar1, fooBar2) {
this.someNumber = 123
this.someString = 'hello, world!'
this.bar = bar // a complex object with a prototype
this.bazArray = [baz1, baz2]
this.fooBarMap = new Map([
['a', fooBar1],
['b', fooBar2]
])
}
}
// Mark class `Foo` as serializable and containing sub-types `Bar`, `Baz` and `FooBar`
module.exports = serializable(Foo, [Bar, Baz, FooBar])
...
const json = require('tanagra-json')
json.init()
// or:
// require('tanagra-protobuf')
// await json.init()
const foo = new Foo(bar, baz)
const encoded = json.encodeEntity(foo)
...
const decoded = json.decodeEntity(encoded, Foo)
Performance
I compared the performance of the two serialisers (the JSON serialiser and experimental protobufs serialiser) with a control (native JSON.parse and JSON.stringify). I conducted a total of 10 trials with each.
I tested this on my 2017 Dell XPS15 laptop with 32Gb memory, running Ubuntu 17.10.
I serialised the following nested object:
foo: {
"string": "Hello foo",
"number": 123123,
"bars": [
{
"string": "Complex Bar 1",
"date": "2019-01-09T18:22:25.663Z",
"baz": {
"string": "Simple Baz",
"number": 456456,
"map": Map { 'a' => 1, 'b' => 2, 'c' => 2 }
}
},
{
"string": "Complex Bar 2",
"date": "2019-01-09T18:22:25.663Z",
"baz": {
"string": "Simple Baz",
"number": 456456,
"map": Map { 'a' => 1, 'b' => 2, 'c' => 2 }
}
}
],
"bazs": Map {
'baz1' => Baz {
string: 'baz1',
number: 111,
map: Map { 'a' => 1, 'b' => 2, 'c' => 2 }
},
'baz2' => Baz {
string: 'baz2',
number: 222,
map: Map { 'a' => 1, 'b' => 2, 'c' => 2 }
},
'baz3' => Baz {
string: 'baz3',
number: 333,
map: Map { 'a' => 1, 'b' => 2, 'c' => 2 }
}
},
}
Write performance
Serialisation method | Ave. inc. first trial (ms) | StDev. inc. first trial (ms) | Ave. ex. first trial (ms) | StDev. ex. first trial (ms) |
---|---|---|---|---|
JSON | 0.115 | 0.0903 | 0.0879 | 0.0256 |
Google Protobufs | 2.00 | 2.748 | 1.13 | 0.278 |
Control group | 0.0155 | 0.00726 | 0.0139 | 0.00570 |
Read performance
Serialisation method | Ave. inc. first trial (ms) | StDev. inc. first trial (ms) | Ave. ex. first trial (ms) | StDev. ex. first trial (ms) |
---|---|---|---|---|
JSON | 0.133 | 0.102 | 0.104 | 0.0429 |
Google Protobufs | 2.62 | 1.12 | 2.28 | 0.364 |
Control group | 0.0135 | 0.00729 | 0.0115 | 0.00390 |
Summary
The JSON serialiser is around 6-7 times slower than native serialisation. The experimental protobufs serialiser is around 13 times slower than the JSON serialiser, or 100 times slower than native serialisation.
Additionally, the internal caching of schema/structural information within each serialiser clearly has an effect on performance. For the JSON serialiser, the first write is about four times slower than the average. For the protobuf serialiser, it’s nine times slower. So writing objects whose metadata has already been cached is much quicker in either library.
The same effect was observed for reads. For the JSON library, the first read is around four times slower than the average, and for the protobuf library, it’s around two and a half times slower.
The performance issues of the protobuf serialiser mean it’s still in the experimental stage, and I would recommend it only if you need the format for some reason. However, it is worth investing some time in, as the format is much terser than JSON, and therefore better for sending over the wire. Stack Exchange uses the format for its internal caching.
The JSON serialiser is clearly much more performant but still significantly slower than the native implementation. For small object trees, this difference is not significant (a few milliseconds on top of a 50ms request will not destroy the performance of your site), but this could become an issue for extremely large object trees, and is one of my development priorities.
Roadmap
The library is still in the beta stage. The JSON serialiser is reasonably well-tested and stable. Here is the roadmap for the next few months:
- Performance improvements for both serialisers
- Better support for pre-ES6 JavaScript
- Support for ES-Next decorators
I know of no other Javascript library that supports serialising complex, nested object data, and deserialising to its original type. If you’re implementing functionality that would benefit from the library, please give it a try, get in touch with your feedback, and consider contributing.