One drawback of the current design is that all the tags and all the fields are stored in a JSONB structure. JSON is a great format if you want to have an open format where you can add new fields as necessary and even have more structured data than what you have in plain columns, but this non-relational format is not particularly good for efficient queries. In this post you will see some improvements to the code that allow you to be more selective about what you save in the tables. It provides some flexibility in how to structure your data for writing efficient queries, but as you will see, it will also offer some performance improvements. As always, the code is available in the pg_influx
repository on GitHub.
The reason to collect data is that you want to be able to query it to gain insight into what is happening with the system and for that you need to be able to query it efficiently, but how does queries on JSONB behave? As an example, the explain plan for a query that tries to count the number of rows using “cpu4” would be:
influx=# explain (analyze, costs off) select * from magic.cpu where _tags->>'cpu' = 'cpu4';
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Seq Scan on cpu (actual time=0.020..25.900 rows=12499 loops=1)
Filter: ((_tags ->> 'cpu'::text) = 'cpu4'::text)
Rows Removed by Filter: 111646
Planning Time: 0.079 ms
Execution Time: 26.211 ms
(5 rows)
To compare with the performance when we have the CPU as a separate column, it’s possible to just rewrite the table to create a separate column. Running an explain query on the resulting table gives a significantly better result.
influx=# explain (analyze, costs off) select * from magic.cpu where cpu = 'cpu4';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Gather (actual time=2.440..12.590 rows=12499 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on cpu (actual time=0.725..6.940 rows=4166 loops=3)
Filter: (cpu = 'cpu4'::text)
Rows Removed by Filter: 37215
Planning Time: 0.072 ms
Execution Time: 12.961 ms
(8 rows)
As you can see, the time is not only significantly smaller, in this case it also able to use a parallel scan rather than a sequential scan, but having to constantly rewrite the data to be able to query it efficiently seems adds complexity and does not seem to be useful. (There are actually good reasons to do this, but it won’t be discussed in this post.) Inserting directly into columns would be really cool: it would save a lot of time when querying the table and it would also be a way to keep the table size down. There is, however, a small crux. The tag values and the fields that are received are strings, and inserting into a text column is straightforward, but what happens if you insert into a column that is not a text column? Fortunately, PostgreSQL types are designed to support this, and it is even possible to leverage this to create a more flexible solution.
Type input and output functions
New types can be created using the CREATE TYPE command and in this particular case the focus will be on base types. As an example, the integer type could be defined in this way (internal types are directly entered into the system, so this code is not used in PostgreSQL to define the integer type):
CREATE FUNCTION int4in(cstring, oid, integer) RETURNS integer LANGUAGE C STRICT AS '$libdir/libpostgres.so';
CREATE FUNCTION int4out(integer) RETURNS cstring LANGUAGE C STRICT AS '$libdir/libpostgres.so';
CREATE TYPE integer (
INPUT = int4in,
OUTPUT = int4out
);
In addition to a type name, types have an input and output functions that are use to convert strings into the internal representation and the internal representation to a string. The output function is straightforward, but the input function takes three parameters: the string to convert, the OID of the target type, and the type modifier for the type. The last two parameters is not something that is going to be covered here, so you have to be satisfied with understanding that they are needed to call the input function. The input function can be used to transform any string to a proper internal format, and this is what is going to be exploited here.
Using this information, the code will be rewritten so that InfluxDB lines are processed in the following steps:
- Parse the input line to get the timestamp, the metric name, the tags, and the fields.
- Open the table given by the metric name. The table need to stay locked so that the table definition does not change while the tuple is generated.
- Get the the tuple descriptor for the table.
- Retrieve the input functions for the types mentioned in the tuple descriptor and all information necessary to call these functions.
- For each tag, check if there is a column with the same name as the tag name. If there is, apply the input function and store the value in the correct column and remove the tag from the list of tags.
- Do the same for the fields, but if something is already stored in the column, do not overwrite that value. This allow you to have tags and fields with the same name, but tags take precedence.
- If there is a column named
_tags
, use that to store the remaining tags. It has to be in JSONB format. - If there is a column named
_fields
, use that to store the remaining fields. It has to be in JSONB format. - If there is a column named
_time
, use that to store the time. It has to be in timestamp-compatible format.
In the previous post, steps one and two where already covered. Since step three and four is quite common, there is already support available in PostgreSQL—in the form of a structure called AttInMetadata
and the function BuildTupleFromCStrings
. It is not a perfect match for this use-case, so it is necessary to tweak it a little to make it work for this code.
Building tuples from strings
To be able to build tuples from strings, PostgreSQL has a structure called AttInMetadata
which can be built from a tuple descriptor. It contains all the necessary information to be able to run the input function on a string and store the result in a tuple. As an example, to build a heap tuple using the strings “1”, “3.14”, and “magic”, you can use this code:
AttInMetadata *attinmeta;
HeapTuple tuple;
Relation rel;
.
.
.
rel = table_open("my_table", AccessShareLock);
attinmeta = TupleDescGetAttInMetadata(RelationGetDescr(rel));
.
.
.
const char* values[] = { "1", "3.14", "magic" };
tuple = BuildTupleFromCStrings(attinmeta, values);
The function RelationGetDescr
get the tuple descriptor from an open relation, so it is necessary first open the relation and make sure that it is locked for access. The table is locked to prevent the table definition from changing while the tuple is built (here is is assumed that it is later inserted into the table, so the definition cannot change).
The function TupleDescGetAttInMetadata
builds a AttInMetadata
structure containing a tuple descriptor, one array for the type identifiers, one array for the input functions, and one array for the type modifiers. The tuple descriptor is stored there as well so there is no need to save it elsewhere. The three arrays are used inside the BuildTupleFromCStrings
to convert the strings to internal values by calling the input function. Since it requires that all the values are stored as strings it is not particularly suitable for this case since some of the values are actually already in the correct format (the JSONB values in this case). Instead, write a dedicated function for filling in just a single value in the values
and nulls
arrays.
void BuildFromCString(AttInMetadata *attinmeta, char *value, int attnum,
Datum *values, bool *nulls) {
values[attnum - 1] = InputFunctionCall(
&attinmeta->attinfuncs[attnum - 1], value,
attinmeta->attioparams[attnum - 1], attinmeta->atttypmods[attnum - 1]);
nulls[attnum - 1] = (value == NULL);
}
Internally, BuildTupleFromCStrings
uses the function InputFunctionCall
to call the input function in the correct way and as you saw earlier in the post, it needs three parameters: the string to convert, an OID, and the type modifier. These are all stored in the AttInMetadata
structure so they are just passed to the function together with the string to convert. Also remember that the attribute numbers start at 1, so it is necessary to subtract 1 to get the index into the arrays.
Next step is to write a function that builds the values and nulls arrays using this function.
void InsertItems(List **pitems, AttInMetadata *attinmeta, Datum *values,
bool *nulls) {
ListCell *cell;
TupleDesc tupdesc = attinmeta->tupdesc;
List *items = *pitems;
foreach (cell, items) {
ParseItem *item = (ParseItem *)lfirst(cell);
const int attnum = SPI_fnumber(tupdesc, item->key);
if (attnum > 0) {
BuildFromCString(attinmeta, item->value, attnum, values, nulls);
items = foreach_delete_current(items, cell);
}
}
*pitems = items;
}
The InsertItems
will extract all values from the list of items and put them in the correct columns based on the name of the key for the item. Most of the contents of this function should be familiar from the previous posts, but there are a few notable new constructions that is worth mentioning.
The function SPI_fnumber
allow you to get the attribute number of a column based on the name of a column. It will return a positive number if there is a column by that name. The value SPI_ERROR_NOATTRIBUTE
(which happen to be -9 in PostgreSQL 13.5) if there is no such column, and less than zero for the internal columns. The value is only filled in if it is a real column with an attribute number over zero, otherwise the value is left in the list. There is a special case where a dropped column can remain in the tuple descriptor (see attisdropped
in the pg_attribute
table), but SPI_fnumber
handles this correctly.
The macro foreach
is used to iterate over List
types and takes the list to iterate over, and a ListCell
pointer that will point to the current cell in the list in each step of the iteration. If you want to delete from the list while iterating over it, you should use foreach_delete_current
. This will delete the element from the list, but also adjust the state so that you do not accidentally skip an element in the list.
Wrapping up
With these changes it is now possible to read the fields that you’re interested in into dedicated columns. For example, if you have a line like this:
diskio,host=fury,name=nvme0n1p2 io_time=2595292i,read_bytes=309936706560i,write_bytes=80954531840i 1574753954000000000
You can use a table with this definition to receive the lines without having to resort to use JSONB
CREATE TABLE magic.diskio (
_time timestamptz,
host text,
name text,
io_time integer,
read_bytes integer,
write_bytes integer
);
This is not intended to be a performance improvement and is only focused on creating a more flexible interface. However, because it is now possible to select what columns are processed, it is possible to avoid some work by not having columns that are expensive to build in the table. In particular, building the JSONB structure is quite expensive compared to just adding selected fields to a column.
To test this and see what the performance difference is, I changed the table definitions used in influx-faker and ran a test using only JSONB columns, and using selected columns and no JSONB columns at all. In Figure 1 you see the performance increase as a result of dropping the _tags
column from the tables and creating separate columns for the tags.
Comments