Generating JSONB From Typescript
I recently had a need to serialize data from Typescript into Postgresql's JSONB format. I was surprised I couldn't find a package on NPM to do this, so I wanted to share the code I wrote. But first, let me explain why I wanted to generate JSONB from Typescript.
The challenge
I'm working in database schema that is in third normal form. I can't divulge the details because it's for my job, and there's a real possibility you'll get lost in details that aren't significant. Here's a toy schema that exhibits the relevant properties.
I'm inserting entire products into the database at once. How do I know I'm not inserting duplicates? It's easy to prevent duplicate insertion at the level of individual tables, but how do I know the entire product entity, including its pricing, shipping, and discount attributes, is fully unique?
I set out to solve this by hashing a JSONB structure that represents the entity I intend to insert and comparing that to hashes of the existing entities that share the same name and description in the PRODUCTS table.
JSONB vs JSON
Comparing two JSON objects is fraught because key order is insignificant.
{
"one": {
"hello": "world"
},
"two": 2
}
{"two":2,"one":{"hello":"world"}}
These two objects are semantically equal. You'll need to parse them, however, and then recursively compare them in order to determine that programmatically. What if we added some serialization rules that would let us use byte-wise comparison to determine equality, just like we would compare plain, old strings?
JSONB is a data type in Postgresql that adds such serialization rules. The extra serialization rules for JSONB are actually more powerful than just enabling simpler equality checking. They allow binary search of object keys. But I'm not interested in that quality at the moment. I just want to be able to use string equality (by way of hashing in my case) to compare two JSON values.
Here are the added rules for JSONB:
No duplicate keys within an object. This rule doesn't really affect my use case. In my experience, duplicate keys are more of a risk in hand-written JSON than in JSON that comes from serialized key-value data structures because most key-value data structures also don't support multiple instances of the same key.
Keys in objects are sorted, first by length from short to long, then ascending by code point value. This property is what's most important to me. By "ascending by code point value", it means a naive string sort that doesn't take into account factors like locale. Postgresql uses C's
memcmp()
function.The only syntactic whitespace is a single space after colons and commas. This means there are no newlines or indentation, but JSONB also isn't quite as compact as JSON can be.
Just show me the code already
Now that I've briefly explained what JSONB is, and why I'm choosing to use it, let me show you how to generate it from Typescript.
I'm leaning on the JSON.stringify()
function to do most of the heavy lifting. Most of its default behavior is adequate. I only need to change how it orders keys within objects, and how it adds syntactic whitespace.
The second argument to JSON.stringify()
can be a function that replaces values as they're being serialized. I only want to modify objects.
const json = JSON.stringify(
data,
(_key, value) => {
if (typeof value === 'object' && !Array.isArray(value) && value !== null) {
// TODO: Sort the object's keys
}
return data;
},
0
);
Note that I'm checking to make sure value
is not null because typeof null
is 'object'
as well.
Here's my comparison function that can be passed to a sorting function. It assumes an array of key-value tuples are being sorted.
function compare(
[keyA, _valueA]: [string, unknown],
[keyB, _valueB]: [string, unknown]
): number {
if (keyA.length === keyB.length) {
for (let i = 0; i < keyA.length; i++) {
const charCodeA = keyA.charCodeAt(i);
const charCodeB = keyB.charCodeAt(i);
if (charCodeA === charCodeB) {
continue;
}
// Sort the same way as memcmp
return charCodeA - charCodeB;
}
return 0;
}
// Return shorter keys before longer keys
return keyA.length > keyB.length ? 1 : -1;
}
This sorting algorithm matches how Postgresql does it: from short keys to long keys with keys of identical length being sorted ascending byte-wise. The relevant function in the Postgresql source code is lengthCompareJsonbString
.
Unfortunately, I can't sort the entries using my compare
function and then turn the entries back into an object.
❌ Object.fromEntries(Object.entries(value).sort(compare))
I knew that key order of objects in JavaScript can't be relied upon, but I always thought keys were stored in insertion order. It turns out, as the Reflect.ownKeys()
page on MDN will tell you, JavaScript's object key order has a corner case that probably won't affect you... until it does.
Here's how object keys are sorted in JavaScript:
"Non-negative integer indexes in increasing numeric order (but as strings)" This is a really interesting gotcha. I can only speculate as to why this is the case. The ramifications are fascinating. It means if you see something like
x[0] = 'foo';
in isolation, you can't necessarily assume thatx
is an array! In any case, this is the primary pitfall I'm going to have to work around."Other string keys in the order of property creation" This is the rule I already knew about. Keys are usually presented in the order in which they were inserted into the object.
"Symbol keys in the order of property creation." This is referring to the
Symbol
type. I don't need to consider this case because Symbols aren't supported in JSON, and the documentation forJSON.stringify()
explicitly calls out that Symbol keys in objects are ignored.
Here's what I've come up with to work around the special case of non-negative integer keys.
function createOrderedObject(entries: [string, unknown][]): Record<string, unknown> {
const map = new Map<string | symbol, unknown>(entries); // Store keys in insertion order
return new Proxy(
{},
{
get(_target, prop) {
return map.get(prop);
},
set(_target, prop, value) {
map.set(prop, value);
return true;
},
has(_target, prop) {
return map.has(prop);
},
deleteProperty(_target, prop) {
return map.delete(prop);
},
ownKeys(_target) {
return [...map.keys()]; // Preserve ordered keys for Object.keys()
},
getOwnPropertyDescriptor(_target, prop) {
return map.has(prop)
? {
enumerable: true,
configurable: true,
}
: undefined;
},
}
);
}
The only difference between a plain old JavaScript object and the object createOrderedObject
returns is that my version really does return keys in insertion order.
The last thing to fix is how whitespace is handled. JSON.stringify()
's third argument allows you to control syntactic whitespace, but there's no value that will format the output like JSONB. The closest I can do is pass in 0
to minify the JSON and then handle adding a space after colons and commas myself. My final function for serializing to JSONB looks like this:
function stringify(data: unknown): string {
const json = JSON.stringify(
data,
(_key, value) => {
if (typeof value === 'object' && !Array.isArray(value) && value !== null) {
return createOrderedObject(Object.entries(value).sort(compare));
}
return value;
},
0
);
// Safely add spaces after `:` and `,` characters. REGEX CANNOT DO THIS!
let formatted = '';
let inString = false;
for (let i = 0; i < json.length; i++) {
let char = json[i];
let prevChar = i > 0 ? json[i - 1] : '';
if (char === '"' && prevChar !== '\\') {
inString = !inString; // Toggle string mode only if it's NOT an escaped quote
}
formatted += char;
// Add space after `:` and `,` only when outside a string
if (!inString && (char === ':' || char === ',')) {
formatted += ' ';
}
}
return formatted;
}