Creating JSON Values

In the previous post, you could see how to create set-returning functions that returns several rows of a table and we returned rows consisting of the timestamp, the metric, and two JSONB values: one for the tags and one for the fields. There were, however, no coverage of what JSONB is, how it differs from JSON, and how to construct them in your code. This post is going to answer those questions.

JSON and JSONB

JSON is a compact textual format for representing structured data and is described in RFC 7159. The format (and name) comes from the syntax of JavaScript objects and the idea was that the browser could easily process responses in this format by evaluating the text string to read the data, which simplifies the code. (Note, however, that it is a highly dangerous practice to evaluate strings from the internet willy-nilly and it is not a recommended practice.) PostgreSQL has two different types for storing JSON: JSON (type name json) and JSONB (type name jsonb). With the JSON type, the string is stored directly in the column and no parsing takes place. This is a very fast operation, but the drawback is that it has to be parsed each time it is used (for any query that actually looks into the JSON, of course). In contrast, the JSONB type is parsed when it is read and stored in a format that is faster to query, but also support indexing. This additional parsing takes time, which means that it is slower to process on insert. Both types support a lot of different operations and functions, but the JSONB type has a lot more useful operators thanks to the fact that it is already structured.

In this post the focus will be on the JSONB type. It is the most versatile of the types but also the most interesting one. Internally in PostgreSQL, JSONB format is handled in two different ways: one in-memory structure that is easy to work with and one serialized format that is stored on disk and also returned from functions. The in-memory representation build on the structure JsonbValue, which consists of a type tag and a union with the different representations. The serialized format is the structure Jsonb and is essentially just a byte array. The class diagram in Figure 1 illustrates the relation between the types. (A structure with a union is a common way to implement inheritance in C, so inheritance notation is used in this diagram.)

Figure 1. Class diagram for JSON data structures

JSONB Builders

Figure 2. Parsing JSON

If you looked at the previous post, you noted that parser work using a state which is then updated as the parsing proceeds using functions. This design pattern is called a Builder and is a common way to implement parsers. The same approach is taken in the JSONB parser in PostgreSQL and is using a JSON parser state to build the object. As parsing proceeds, the JSONB object will be built in memory and at the end you can extract the final JSONB object from the builder after the parsing is complete. An example of how a the builder is used in parsing can be seen in Figure 2. As tokens are recognized, state changes are pushed to the builder which will construct the object internally and make it available at the end of the parse.

To construct a new JSONB type to return, you need to build the in-memory representation of the JSON and then serialize it to return it from the function, which is what the code below does.

#include <postgres.h>

#include <utils/jsonb.h>
    .
    .
    .
static Jsonb *BuildJsonObject(List *items) {
  ListCell *cell;
  JsonbParseState *state = NULL;
  JsonbValue *value;

  pushJsonbValue(&state, WJB_BEGIN_OBJECT, NULL);
  foreach (cell, items) {
    ParseItem *item = (ParseItem *)lfirst(cell);
    AddJsonField(&state, item->key, item->value);
  }
  value = pushJsonbValue(&state, WJB_END_OBJECT, NULL);

  return JsonbValueToJsonb(value);
}

The parser state is quite easy to construct and is just a NULL pointer. The variable will be updated as the parsing proceeds by pushing WJB_* constants (with values) into the parser state. The function BuildJsonObject added here is used to construct a JSON object and starts the construction by calling pushJsonbValue with WJB_BEGIN_OBJECT and terminates the construction of it by pushing WJB_END_OBJECT into the parser state. Since starting and terminating an object does not require a value, it is sufficient to use NULL as the second parameter for both of these calls. (It is actually an error to provide anything that is not NULL in this case. You will get an assertion failure in a debug build, but in a release build is will just be ignored.) Pushing WJB_END_OBJECT into the parser state will return the constructed objects as a JsonbValue pointer allocated in the current memory context. This value is then transformed into serialized format and returned from the function using the JsonbValueToJsonb.

Next step is to build the internals of the object—the key-value pairs that you saw in Figure 1—and is done using the AddJsonField function.

static void AddJsonField(JsonbParseState **state, char *key, char *value) {
  JsonbValue jb_key, jb_val;
  jb_key.type = jbvString;
  jb_key.val.string.val = pstrdup(key);
  jb_key.val.string.len = strlen(key);

  (void)pushJsonbValue(state, WJB_KEY, &jb_key);

  jb_val.type = jbvString;
  jb_val.val.string.val = pstrdup(value);
  jb_val.val.string.len = strlen(value);
  (void)pushJsonbValue(state, WJB_VALUE, &jb_val);
}

To handle To add a key-value pair, it is necessary to construct two instances of JsonbValue and pushing them into the parser state: first the key and then the value. This will form a JsonbPair and add it to the partially constructed JSON object in the parser state.

The next post will cover the server programming interface that is used to execute SQL statements and insert data into tables. With this done, you will then have a fully functional extension that can receive InfluxDB Line Protocol and insert the data into tables.

Mats

dbmsdrops.kindahl.net

Long time developer with a keen interest in databases, distributed systems, and programming languages. Currently working as Database Architect at Timescale.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *