HOWTO use JSONB, the binary formatted JSONs in PostgreSQL

A previous blog I wrote on JSON functionality in PostgreSQL went wildly popular and it got me thinking about writing a follow-up HOWTO about JSONB.

JSONB, like JSON, is a native data type in PostgreSQL and was introduced in version 9.4. The major difference between the types is signified in the name; the extra ‘B’ stands for binary format. Some more detailed comparison is given in the table below:

JSON

  • Stores data in text format
  • Input is as fast, as no conversions are required
  • Processing functions must re-parse the data on each execution
  • Indexing is not supported
  • All white space and line feeds in the input are preserved as-is
  • Duplicate keys are retained, processing functions only consider the last value
  • Order of the keys is preserved

JSONB

  • Stores data in decomposed binary format
  • Input is slightly slower, as there is conversion overhead involved
  • Re-parsing is not needed, making data processing significantly faster
  • Indexing is supported
  • Extra white space and line feeds are stripped
  • Duplicate keys are purged at input, only the last value is stored
  • Order is not preserved

In general, unless there are some highly specialized needs (like legacy applications designed to assume an order of keys), it is highly recommended that JSONB be used.

Basic handling provided by PostgreSQL for both JSON and JSONB is pretty much the same. Because I have already talked about JSON in a previous blog, I will focus on the difference JSONB provides below.

Ignoring extra white space

Let’s start with the following valid object:

{
     "key1": "value1",
     "key2": [1, 2],
     "key3": "value3"
 }

JSON data type shows the following:

json_sample=# SELECT '{
                             "key1": "value1",
                             "key2": [1, 2],
                             "key3": "value3"
                         }'::JSON;
                          json
                 -----------------------
                  {                    +
                      "key1": "value1",+
                      "key2": [1, 2],  +
                      "key3": "value3" +
                     }
                 (1 row)

 

With JSONB, however:

json_sample=# SELECT '{
                             "key1": "value1",
                             "key2": [1, 2],
                             "key3": "value3"
                         }'::JSONB;
                         jsonb
 ------------------------------------------------------
  {"key1": "value1", "key2": [1, 2], "key3": "value3"}
 (1 row)

 

As you can see, all extra white space and line feeds in case JSONB were stripped by the database.

Handling duplicate keys

Behavior with JSON data type:

 json_sample=# SELECT '{"key1": "value1", "key2": [1, 2, 3, 4], "key2":[1,2], "key2":"value2", "key3": "value3"}'::JSON;
 json
 -------------------------------------------------------------------------------------------
 {"key1": "value1", "key2": [1, 2, 3, 4], "key2":[1,2], "key2":"value2", "key3": "value3"}
 (1 row)

Behavior with JSONB data type:

 json_sample=# SELECT '{"key1": "value1", "key2": [1, 2, 3, 4], "key2":[1,2], "key2":"value2", "key3": "value3"}'::JSONB;
 jsonb
 --------------------------------------------------------
 {"key1": "value1", "key2": "value2", "key3": "value3"}
 (1 row)

 

The above demonstrates how duplicate keys are essentially ignored by JSONS and only the last value is kept.

Indexing

In my personal opinion, indexing is the biggest differentiating factor between JSON and JSONB data types in PostgreSQL, making it the biggest reason to opt for JSONB wherever possible. Let’s illustrate with an example below:

Let’s create 2 tables with the same structure, one with an index and one without an index:

json_sample=# CREATE TABLE sales_jsonb (id INTEGER, sale JSONB);
CREATE TABLE
json_sample=# CREATE TABLE sales_jsonb_index (id INTEGER, sale JSONB);
CREATE TABLE
json_sample=# CREATE INDEX idxgin ON sales_jsonb_index USING gin (sale);
CREATE INDEX

 

Now let’s add some data to both these tables. Data used in this example (512,000 rows) can be downloaded here:  JSONB Data (138 downloads ) .

json_sample=# COPY sales_jsonb(sale) FROM '/Users/Shared/json.txt';
COPY 512000
json_sample=# COPY sales_jsonb_index(sale) FROM '/Users/Shared/json.txt';
COPY 512000

 

Running EXPLAIN on both these tables illustrates the cost difference very clearly:

json_sample=# EXPLAIN (FORMAT JSON) select * from sales_jsonb where sale @> '{"item": "205386681-0"}';
QUERY PLAN
------------------------------------------------------------------
[                                                              +
{                                                             +
"Plan": {                                                   +
"Node Type": "Seq Scan",                                  +
"Relation Name": "sales_jsonb",                           +
"Alias": "sales_jsonb",                                   +
"Startup Cost": 0.00,                                     +
"Total Cost": 14748.00,                                   +
"Plan Rows": 512,                                         +
"Plan Width": 103,                                       +
"Filter": "(sale @> '{\"item\": \"205386681-0\"}'::jsonb)"+
}                                                           +
}                                                             +
]
(1 row)

 

json_sample=# EXPLAIN (FORMAT JSON) select * from sales_jsonb_index where sale @> '{"item": "205386681-0"}';
QUERY PLAN
--------------------------------------------------------------------------
[                                                                       +
{                                                                    +
"Plan": {                                                           +
"Node Type": "Bitmap Heap Scan",                                 +
"Relation Name": "sales_jsonb_index",                            +
"Alias": "sales_jsonb_index",                                     +
"Startup Cost": 39.97,                                           +
"Total Cost": 1670.57,                                           +
"Plan Rows": 512,                                                 +
"Plan Width": 103,                                               +
"Recheck Cond": "(sale @> '{\"item\": \"205386681-0\"}'::jsonb)", +
"Plans": [                                                        +
{                                                               +
"Node Type": "Bitmap Index Scan",                             +
"Parent Relationship": "Outer",                               +
"Index Name": "idxgin",                                       +
"Startup Cost": 0.00,                                         +
"Total Cost": 39.84,                                        +
"Plan Rows": 512,                                             +
"Plan Width": 0,                                             +
"Index Cond": "(sale @> '{\"item\": \"205386681-0\"}'::jsonb)"+
}                                                               +
]                                                                 +
}                                                                   +
}                                                                    +
]
(1 row)

 

As you can see, the total cost for the same query on the same table structure for the same data set goes from 14,748 to 1,670.57 when the JSONB column is indexed.

Have questions? Contact us NOW!

Join the discussion 6 Comments

  • Have you examined how much slower JSONB is in writing data than JSON? JSON may not convert the data, but it does parse it (ensuring it is valid JSON), which I have a feeling is not that much slower than converting it.

    • Umair Shahid says:

      Great point Matthew. So I checked it out with the same data (512,000 rows) being loaded in this post above. Got the following results:

      json_sample=# copy sales(sale) from ‘/Users/Shared/json.txt';
      COPY 512000
      Time: 4266.087 ms
      json_sample=# copy sales_jsonb(sale) from ‘/Users/Shared/json.txt';
      COPY 512000
      Time: 6092.112 ms

      The table ‘sales’ has JSON data type, while the table ‘sales_jsonb’ has JSONB. As you can see, load into JSON column took 4,266.087ms while the same data loaded into a JSONB column took 6,092.112ms. That’s an increase of over 40% … not insignificant by any standards.

  • I’d like to find out more? I’d love to findd out mor details.

  • […] job for most business cases. I have already talked about handling unstructured data with JSON, JSONB, and HSTORE. Another feature that you can use for scaling is table […]

  • Jacques says:

    I don’t even know how I ended up right here, but I believed this post was
    good. I don’t recognise who you’re however certainly youu are going to a famous blogger if you aren’t already.
    Cheers!

Leave a Reply


5 + = 10

Fortune 500 company uses PostgreSQL to implement BI Download the case study