转载

PostgreSQL 9.5 new feature highlight: BRIN indexes

前几天PostgreSQL 9.5 Alpha 1版本刚刚发布，在新版本中吸引我注意的是 BRIN index 。为什么引人注意？因为这就是活脱脱的Oracle Exadata中的Storage Index和Oracle Database 12.1.0.2中的新功能Zone Maps。

Exadata的Storage Index不说了，因为那并非数据库范畴的解决方案，而Oracle数据库12.1.0.2中的新功能Zone Maps曾让我非常激动，但是最终发现该功能也只能在运行于Exadata上的Oracle中才能启用，略失望。

Zone Maps的解释如下：

Zone maps in an Oracle Database store minimum and maximum values of columns for a range of blocks (known as a zone). In addition to performing I/O pruning based on predicates of clustered fact tables, zone maps prune on predicates of dimension tables provided the fact tables are attribute-clustered by the dimension attributes though outer joins with the dimension tables.

BRIN index的解释如下：

BRIN stands for Block Range INdexes, and store metadata on a range of pages. At the moment this means the minimum and maximum values per block…So if a 10GB table of order contained rows that were generally in order of order date, a BRIN index on the order_date column would allow the majority of the table to be skipped rather than performing a full sequential scan.

同样的思路，在一个类索引结构中存储一定范围的数据块中某个列的最小和最大值，当查询语句中包含该列的过滤条件时，就会自动忽略那些肯定不包含符合条件的列值的数据块，从而减少IO读取量，提升查询速度。

以下借用Pg wiki中的例子解释BRIN indexes的强大。

-- 创建测试表orders  CREATE TABLE orders (      id INT,      order_date timestamptz,      item text);    -- 在表中插入大量记录，Pg的函数generate_series非常好用。  INSERT INTO orders (order_date, item)  SELECT x, 'dfiojdso'   FROM generate_series('2000-01-01 00:00:00'::timestamptz, '2015-03-01 00:00:00'::timestamptz,'2 seconds'::INTERVAL) a(x);    -- 该表目前有13GB大小，算是大表了。  # /dt+ orders                     List OF relations   Schema |  Name  | TYPE  | Owner | SIZE  | Description   --------+--------+-------+-------+-------+-------------   public | orders | TABLE | thom  | 13 GB |   (1 ROW)    -- 以全表扫描的方式查询两天内的记录，注意这里预计需要30s，这是一个存储在SSD上Pg数据库，因此速度已经很理想了。  # EXPLAIN ANALYSE SELECT COUNT(*) FROM orders WHERE order_date BETWEEN '2012-01-04 09:00:00' AND '2014-01-04 14:30:00';                                                                           QUERY PLAN                                                                            -------------------------------------------------------------------------------------------------------------------------------------------------------------   Aggregate  (cost=5425021.80..5425021.81 ROWS=1 width=0) (actual TIME=30172.428..30172.429 ROWS=1 loops=1)     ->  Seq Scan ON orders  (cost=0.00..5347754.00 ROWS=30907121 width=0) (actual TIME=6050.015..28552.976 ROWS=31589101 loops=1)           FILTER: ((order_date >= '2012-01-04 09:00:00+00'::TIMESTAMP WITH TIME zone) AND (order_date <= '2014-01-04 14:30:00+00'::TIMESTAMP WITH TIME zone))           ROWS Removed BY FILTER: 207652500   Planning TIME: 0.140 ms   Execution TIME: 30172.482 ms   (6 ROWS)    -- 接下来在order_date列上创建一个BRIN index  CREATE INDEX idx_order_date_brin    ON orders    USING BRIN (order_date);    -- 查看这个BRIN index占多少物理空间，13GB的表，而BRIN index只有504KB大小，非常精简。  # /di+ idx_order_date_brin                                List OF relations   Schema |        Name         | TYPE  | Owner | TABLE  |  SIZE  | Description   --------+---------------------+-------+-------+--------+--------+-------------   public | idx_order_date_brin | INDEX | thom  | orders | 504 kB |   (1 ROW)    -- 再次执行相同的SQL，看看性能提升多少。速度上升到只需要6秒钟，提升了5倍。如果这是存储在HDD上的Pg库，这个效果还能更明显。  # EXPLAIN ANALYSE SELECT COUNT(*) FROM orders WHERE order_date BETWEEN '2012-01-04 09:00:00' AND '2014-01-04 14:30:00';                                                                                QUERY PLAN                                                                                 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------   Aggregate  (cost=2616868.60..2616868.61 ROWS=1 width=0) (actual TIME=6347.651..6347.651 ROWS=1 loops=1)     ->  Bitmap Heap Scan ON orders  (cost=316863.99..2539600.80 ROWS=30907121 width=0) (actual TIME=36.366..4686.634 ROWS=31589101 loops=1)           Recheck Cond: ((order_date >= '2012-01-04 09:00:00+00'::TIMESTAMP WITH TIME zone) AND (order_date <= '2014-01-04 14:30:00+00'::TIMESTAMP WITH TIME zone))           ROWS Removed BY INDEX Recheck: 6419           Heap Blocks: lossy=232320           ->  Bitmap INDEX Scan ON idx_order_date_brin  (cost=0.00..309137.21 ROWS=30907121 width=0) (actual TIME=35.567..35.567 ROWS=2323200 loops=1)                 INDEX Cond: ((order_date >= '2012-01-04 09:00:00+00'::TIMESTAMP WITH TIME zone) AND (order_date <= '2014-01-04 14:30:00+00'::TIMESTAMP WITH TIME zone))   Planning TIME: 0.108 ms   Execution TIME: 6347.701 ms  (9 ROWS)    --能够让用户自行设定一个range中可以包含的数据块数，也是很体贴的设计。默认情况下一个range包含128个page，我们可以修改为更小或者更大，包含的page越少则精度越细，相应的BRIN index也就会越大；反之则精度粗，BRIN index小。  -- 创建一个每个range包含32 pages的索引。  CREATE INDEX idx_order_date_brin_32  ON orders  USING BRIN (order_date) WITH (pages_per_range = 32);    -- 再创建一个每个range包含512 pages的索引。  CREATE INDEX idx_order_date_brin_512  ON orders  USING BRIN (order_date) WITH (pages_per_range = 512);   --比较一下各个索引的大小。  # /di+ idx_order_date_brin*                                   List OF relations   Schema |          Name           | TYPE  | Owner | TABLE  |  SIZE   | Description   --------+-------------------------+-------+-------+--------+---------+-------------   public | idx_order_date_brin     | INDEX | thom  | orders | 504 kB  |    public | idx_order_date_brin_32  | INDEX | thom  | orders | 1872 kB |    public | idx_order_date_brin_512 | INDEX | thom  | orders | 152 kB  |   (3 ROWS)