Big Data and PostgreSQL – Scaling out by partitioning tables

Big Data is a buzz word doing the rounds nowadays, and the tool people generally associate with Big Data is Hadoop. While it is true that the sheer volume of data Hadoop can efficiently and cost effectively manage is unparalleled, not all businesses actually need Hadoop. What if you don’t want to adopt yet another fairly complex technology (Hadoop) that you would need to maintain and have in-house expertise in? What if you could do it all in PostgreSQL?

Due to many rich and diverse features, I have found PostgreSQL to be able to do a good enough job for most business cases. I have already talked about handling unstructured data with JSON, JSONB, and HSTORE. Another feature that you can use for scaling is table partitioning.

Let me walk you through a very simple tutorial on how to partition tables in PostgreSQL to enable massive scale-out.

Before starting off, let’s create a new database and then a table that tracks events in it. The table simply stores the type of event, its time, and comments associated with it.

postgres=# CREATE DATABASE partition_sample;
CREATE DATABASE
partition_sample =# CREATE TABLE events (id SERIAL, type INTEGER, time TIMESTAMP WITH TIME ZONE, comments TEXT);
CREATE TABLE

 

1. Tables with inheritance and CHECK CONSTRAINT

First step is to create the partitions, which will essentially be tables that inherit from ‘events’. Each of these tables will have a CHECK CONSTRAINT specifying exactly what range of values are allowed in it. Below, I am setting up 9 such tables.

Table to contain events of type ‘1’ occurring in January 2015:

partition_sample =# create table events_1_2015_01 
                           (CONSTRAINT events_type_time_check 
                           CHECK (time >= '2015-01-01' 
                             AND time < '2015-02-01' 
                             AND type = 1)) 
                           inherits(events);
CREATE TABLE

 

Table to contain events of type ‘2’ occurring in January 2015:

partition_sample=# create table events_2_2015_01 
                          (CONSTRAINT events_type_time_check 
                          CHECK (time >= '2015-01-01' 
                            AND time < '2015-02-01' 
                            AND type = 2)) 
                          inherits(events);
CREATE TABLE

 

And so on forth …

partition_sample=# create table events_3_2015_01 
                          (CONSTRAINT events_type_time_check 
                          CHECK (time >= '2015-01-01' 
                            AND time < '2015-02-01' 
                            AND type = 3)) 
                          inherits(events);
CREATE TABLE
partition_sample=# create table events_1_2015_02 
                          (CONSTRAINT events_type_time_check 
                          CHECK (time >= '2015-02-01' 
                            AND time < '2015-03-01' 
                            AND type = 1)) 
                          inherits(events);
CREATE TABLE
partition_sample=# create table events_2_2015_02 
                          (CONSTRAINT events_type_time_check 
                          CHECK (time >= '2015-02-01' 
                            AND time < '2015-03-01' 
                            AND type = 2)) 
                          inherits(events);
CREATE TABLE
partition_sample=# create table events_3_2015_02 
                          (CONSTRAINT events_type_time_check 
                          CHECK (time >= '2015-02-01' 
                            AND time < '2015-03-01' 
                            AND type = 3)) 
                          inherits(events);
CREATE TABLE
partition_sample=# create table events_1_2015_03 
                          (CONSTRAINT events_type_time_check 
                          CHECK (time >= '2015-03-01' 
                            AND time < '2015-04-01' 
                            AND type = 1)) 
                          inherits(events);
CREATE TABLE
partition_sample=# create table events_2_2015_03 
                          (CONSTRAINT events_type_time_check 
                          CHECK (time >= '2015-03-01' 
                            AND time < '2015-04-01' 
                            AND type = 2)) 
                          inherits(events);
CREATE TABLE
partition_sample=# create table events_3_2015_03 
                          (CONSTRAINT events_type_time_check 
                          CHECK (time >= '2015-03-01' 
                            AND time < '2015-04-01' 
                            AND type = 3)) 
                          inherits(events);
CREATE TABLE

 

A description of the ‘event’s table shows how these child tables are now associated with it:

partition_sample=# \d+ events
                                                       Table "public.events"
Column |           Type          |                     Modifiers                     | Storage | Stats target | Description
----------+--------------------------+-----------------------------------------------------+----------+--------------+-------------
id       | integer                  | not null default nextval('events_id_seq'::regclass) | plain   |             |
type     | integer                 |                                                     | plain   |             |
time     | timestamp with time zone |                                                    | plain   |             |
comments | text                     |                                                     | extended |             |
Child tables: events_1_2015_01,
             events_1_2015_02,
             events_1_2015_03,
             events_2_2015_01,
             events_2_2015_02,
             events_2_2015_03,
             events_3_2015_01,
             events_3_2015_02,
             events_3_2015_03

2. Trigger function to route data appropriately

Next step is to create a trigger function that will fire before an insert and will route data to the appropriate partition. Below is the PL/pgSQL code for this. Notice how, before insertion, table name is determined on the fly by appending ‘type’ and ‘time’ of new incoming data.

CREATE FUNCTION events_partition_function() RETURNS TRIGGER
LANGUAGE plpgsql
AS $_$
DECLARE
v_Table   VARCHAR;
sql       VARCHAR;

 

BEGIN
v_Table = 'events' || '_' || NEW.type || '_' || to_char(NEW.time, 'YYYY_MM');

 

sql := 'INSERT INTO ' || v_Table || ' VALUES ( ($1).*)';

 

EXECUTE sql USING NEW;

 

return NULL;

 

END $_$;

 

Next step is to create a trigger on the ‘events’ table.

CREATE TRIGGER trg_events_insert
BEFORE INSERT ON events
FOR EACH ROW
EXECUTE PROCEDURE events_partition_function();

3. Inserting data

Now let’s insert data that will demonstrate how partitioning works. The data below is designed to populate 1 row in each of the partitions. The ‘comments’ column contains text that mentions where to expect this data.

partition_sample=# insert into events (type, time, comments) values (1, '2015-01-15', 'This should be in events_1_2015_01'),                                                                                                         (2, '2015-01-15', 'This should be in events_2_2015_01'),                                                                                                                             (3, '2015-01-15', 'This should be in events_3_2015_01'),                                                                                                                            (1, '2015-02-15', 'This should be in events_1_2015_02'),                                                                                                                            (2, '2015-02-15', 'This should be in events_2_2015_02'),                                                                                                                             (3, '2015-02-15', 'This should be in events_3_2015_02'),                                                                                                                            (1, '2015-03-15', 'This should be in events_1_2015_03'),                                                                                                                            (2, '2015-03-15', 'This should be in events_2_2015_03'),                                                                                                                             (3, '2015-03-15', 'This should be in events_3_2015_03');
INSERT 0 0

4. Verifying the data

Even though the data got inserted into respective partitions, because these partitions inherit from the master table ‘events’, querying ‘events’ will retrieve all data.

partition_sample=# SELECT * FROM events;
id | type |         time         |             comments
----+------+------------------------+------------------------------------
4 |   1 | 2015-01-15 00:00:00+05 | This should be in events_1_2015_01
5 |   2 | 2015-01-15 00:00:00+05 | This should be in events_2_2015_01
6 |   3 | 2015-01-15 00:00:00+05 | This should be in events_3_2015_01
7 |   1 | 2015-02-15 00:00:00+05 | This should be in events_1_2015_02
8 |   2 | 2015-02-15 00:00:00+05 | This should be in events_2_2015_02
9 |   3 | 2015-02-15 00:00:00+05 | This should be in events_3_2015_02
10 |   1 | 2015-03-15 00:00:00+05 | This should be in events_1_2015_03
11 |   2 | 2015-03-15 00:00:00+05 | This should be in events_2_2015_03
12 |   3 | 2015-03-15 00:00:00+05 | This should be in events_3_2015_03
(9 rows)

 

Partitions need to be designed according to the expected workload. In this case, it is expected that extract criteria from queries will generally containing the ‘type’ and ‘time’ components. An example is:

partition_sample=# select * from events where type = 3 and time > '2015-02-01' and time < '2015-03-01';
id | type |         time         |             comments
----+------+------------------------+------------------------------------
9 |   3 | 2015-02-15 00:00:00+05 | This should be in events_3_2015_02
(1 row)

5. The scale-out advantage

To illustrate how this enables you to scale-out PostgreSQL, simply run the EXPLAIN command with the last query run above.

partition_sample=# explain select * from events where type = 3 and time > '2015-02-01' and time < '2015-03-01';
                                                                           QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..28.20 rows=2 width=48)
   -> Seq Scan on events (cost=0.00..0.00 rows=1 width=48)
         Filter: (("time" > '2015-02-01 00:00:00+05'::timestamp with time zone) AND ("time" < '2015-03-01 00:00:00+05'::timestamp with time zone) AND (type = 3))
   -> Seq Scan on events_3_2015_02 (cost=0.00..28.20 rows=1 width=48)
         Filter: (("time" > '2015-02-01 00:00:00+05'::timestamp with time zone) AND ("time" < '2015-03-01 00:00:00+05'::timestamp with time zone) AND (type = 3))
(5 rows)

 

As you can see, this query runs a sequential scan on the related partition only, leaving the other partitions alone. This is what allows you to scale out PostgreSQL … you can create partitions on tables with huge quantities of rows and still be able to very quickly retrieve the data. You can actually create indexes on these partitions as well to speed things up further.

You gotta love PostgreSQL! :-)

Have questions? Contact us NOW!

Join the discussion 6 Comments

  • What about if you run “select * from events where type = 3″ only? This type of query would not only run at type 3 tables, but sequencially all the tables, right?

    • Umair Shahid says:

      Rafael, I am glad that you brought this up. This query will actually run only on type 3 tables as demonstrated below:

      partition_sample=# explain select * from events where type = 3;
      QUERY PLAN
      ————————————————————————
      Append (cost=0.00..47.01 rows=12 width=48)
      -> Seq Scan on events (cost=0.00..0.00 rows=1 width=48)
      Filter: (type = 3)
      -> Seq Scan on events_3_2015_01 (cost=0.00..23.00 rows=5 width=48)
      Filter: (type = 3)
      -> Seq Scan on events_3_2015_02 (cost=0.00..1.01 rows=1 width=48)
      Filter: (type = 3)
      -> Seq Scan on events_3_2015_03 (cost=0.00..23.00 rows=5 width=48)
      Filter: (type = 3)
      (9 rows)

      Beautiful, isn’t it? :-)

  • whocares says:

    the internal counter for rows modified in postgres is still a uint , which will warp kind of ugly like for a lot of ppl at 4B and change rows modified.

    using postgres for big data is fun till you run into things where the rows modified by your query can’t be checked right with the standard tools.

    • Umair Shahid says:

      Thanks buddy! You are right, limitations are certainly there. The intent of this post is not to present PostgreSQL as an alternative to Hadoop for all workloads. I simply mean to state out that for many workloads, PostgreSQL is good enough and you don’t necessarily have to go to Hadoop to handle your data.

Leave a Reply


2 × = 6

Fortune 500 company uses PostgreSQL to implement BI Download the case study