INDEXES

Postgresql Index :
HEAP :
Storage area for storing the whole row of the table. This is divided into multiple pages (as shown in the above picture) and each page size is by default 8KB. Within each page, each item pointer (e.g. 0,1, 2, ….) points to data within the page.

Index Storage :
This storage stores only key values i.e. columns value contained by index. This is also divided into multiple pages and each page size is by default 8KB.

Tuple Identifier (TID) :
TID is 6 bytes number which consists of two parts. The first part is 4-byte page number and remaining 2 bytes tuple index inside the page. The combination of these two numbers uniquely points to the storage location for a particular tuple. ========================================================================
    postgresql index types:
    =======================
    1.	B-tree 
    2.	Hash 
    3.	GiST 
    4.	sp-Gist 
    5.	GIN 
    6.	BRIN
=============================================

B-Tree Index :
B-Tree is the default and the most commonly used index type. Specifying a primary key or a unique within a CREATE TABLE statement causes PostgreSQL to create B-Tree indexes. CREATE INDEX statements without the USING clause will also create B-Tree indexes:
Example:
Before Index :
    product=# explain analyze select * from t1 where id=5;
                                                QUERY PLAN
    ------------------------------------------------------------------------------------------------------
    Seq Scan on t1  (cost=0.00..179057.19 rows=1 width=11) (actual time=47.515..6802.694 rows=1 loops=1)
    Filter: (id = 5)
    Rows Removed by Filter: 9999999
    Planning Time: 0.491 ms
    Execution Time: 6802.792 ms
    (5 rows)
After Index :
    product=# create index in_t1_id on t1 using btree(id);
    CREATE INDEX
    product=# explain analyze select * from t1 where id=5;
                                                    QUERY PLAN
    ----------------------------------------------------------------------------------------------------------------
    Index Scan using in_t1_id on t1  (cost=0.43..8.45 rows=1 width=11) (actual time=25.031..25.039 rows=1 loops=1)
    Index Cond: (id = 5)
    Planning Time: 5.143 ms
    Execution Time: 25.102 ms
    (4 rows)
B-tree index features introduced in v12 :
  • Reduce locking overhead for B-tree index inserts for improved performance.
  • Introduce REINDEX CONCURRENTLY to make it easier to rebuild an index without down-time.
  • Improve performance for index-only scans on indexes with many attributes.
  • Add a view pg_stat_progress_create_index to report progress for CREATE INDEX and REINDEX.
  • Create index concurrently idx_tree on using btree(id);
        SELECT * FROM pg_stat_progress_create_index;
    
    Hash Index :
    Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash if there were unwritten changes. Also, changes to hash indexes are not replicated over streaming or file-based replication after the initial base backup, so they give wrong answers to queries that subsequently use them. For these reasons, hash index use is presently discouraged.
    Example:
        create index in_t1_id on t1 using hash(id);
    
    Brin Index :
    BRIN stands for Block Range Index. BRIN is designed for handling very large tables. Example:
        create index in_t1_id on t1 using brin(dt);
    
        Explain ana;lyze select * from 
    where dt > ‘28-DEC-2023’ and dt < ‘28-DEC-2024’;

    GIN Index :
    GIN stands for Generalized Inverted Index, commonly referred to as GIN, are most useful when you have data types that contain multiple values in a single column. GIN is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items.
    The most common data types that fall into this bucket are:
  • hStore
  • Arrays
  • Range types
  • JSONB

  • tsvector :
    Postgres has a data type called tsvector that is used for full text search. A tsvector value merges different variants of the same word and removes duplicates to create a sorted list of distinct words called lexemes.

    tsquery :
    which supports indexed queries using these operators:
       <@
        @>
        =
        && 
    
    Example:
        db=# create table ts(doc text, doc_tsv tsvector);
        CREATE TABLE
        db=# insert into ts(doc) values
        db-#   ('Can a sheet slitter slit sheets?'),
        db-#   ('How many sheets could a sheet slitter slit?'),
        db-#   ('I slit a sheet, a sheet I slit.'),
        db-#   ('Upon a slitted sheet I sit.'),
        db-#   ('Whoever slit the sheets is a good sheet slitter.'),
        db-#   ('I am a sheet slitter.'),
        db-#   ('I slit sheets.'),
        db-#   ('I am the sleekest sheet slitter that ever slit sheets.'),
        db-#   ('She slits the sheet she sits on.');
        INSERT 0 9
        db=# update ts set doc_tsv = to_tsvector(doc);
        UPDATE 9
    

    postgres-index

        db=# explain analyze select doc from ts where doc_tsv @@ to_tsquery('many & slitter');
                                                        QUERY PLAN
        -----------------------------------------------------------------------------------------------------------------
        Seq Scan on ts  (cost=10000000000.00..10000000241.00 rows=1 width=32) (actual time=0.104..0.161 rows=1 loops=1)
        Filter: (doc_tsv @@ to_tsquery('many & slitter'::text))
        Rows Removed by Filter: 8
        Planning Time: 0.136 ms
        Execution Time: 0.180 ms
        (5 rows)
    
        db=# create index on ts using gin(doc_tsv);
        CREATE INDEX
        db=# explain analyze select doc from ts where doc_tsv @@ to_tsquery('many & slitter');
                  QUERY PLAN                  -----------------------------------------------------------Bitmap Heap Scan on 
        ts  (cost=12.25..16.51 rows=1 width=32) (actual time=0.066..0.067 rows=1 loops=1)
        Recheck Cond: (doc_tsv @@ to_tsquery('many & slitter'::text))
        Heap Blocks: exact=1
        ->  Bitmap Index Scan on ts_doc_tsv_idx  (cost=0.00..12.25 rows=1 width=0) (actual time=0.057..0.057 rows=1 loops=1)
                Index Cond: (doc_tsv @@ to_tsquery('many & slitter'::text))
        Planning Time: 2.390 ms
        Execution Time: 0.125 ms
        (7 rows) 
    
    Gist Index : (postgresql - version 14)
  • GiST stands for Generalized Search Tree.
  • GiST indexes allow a building of general tree structures.
  • GiST indexes are useful in indexing geometric data types and full-text search.
  • Example:point,line,box,circle
    which support indexed queries using these operators:
        <<
        &<
        &>
        >>
        <<|
        &<|
        |&>
        |>>
        @>
        <@
        ~=
        &&
    
    Example::
        db=# create table points(p point);
        CREATE TABLE
        db=# insert into points(p) values
            (point '(1,1)'), (point '(3,2)'), (point '(6,3)'),
            (point '(5,5)'), (point '(7,8)'), (point '(8,6)');
        INSERT 0 6
        db=# explain analyze select * from points where p <@ box '(2,1),(7,4)';
                                                    QUERY PLAN
        --------------------------------------------------------------------------------------------------
        Seq Scan on points  (cost=0.00..33.13 rows=2 width=16) (actual time=0.047..0.049 rows=2 loops=1)
        Filter: (p <@ '(7,4),(2,1)'::box)
        Rows Removed by Filter: 4
        Planning Time: 1.621 ms
        Execution Time: 0.082 ms
        (5 rows)
    
        db=# create index on points using gist(p);
        CREATE INDEX
        db=# explain analyze select * from points where p <@ box '(2,1),(7,4)';
                                                QUERY PLAN
        -------------------------------------------------------------------------------------------------
        Seq Scan on points  (cost=0.00..1.07 rows=1 width=16) (actual time=0.021..0.023 rows=2 loops=1)
        Filter: (p <@ '(7,4),(2,1)'::box)
        Rows Removed by Filter: 4
        Planning Time: 3.361 ms
        Execution Time: 0.048 ms
        (5 rows)
    
        db=# set enable_seqscan=off; (fullscan)
        SET
        db=# explain analyze select * from points where p <@ box '(2,1),(7,4)';
                                                                QUERY PLAN                                                    
        ---------------------------------------------------------------------------------------------------------------------------
        Index Only Scan using points_p_idx on points  (cost=0.13..8.15 rows=1 width=16) (actual time=0.269..0.271 rows=2 loops=1)
        Index Cond: (p <@ '(7,4),(2,1)'::box)
        Heap Fetches: 2
        Planning Time: 0.145 ms
        Execution Time: 0.313 ms
        (5 rows)
    
        db=# explain (costs off) select * from points where p <@ box '(2,1),(7,4)';
                        QUERY PLAN
        ----------------------------------------------
        Index Only Scan using points_p_idx on points
        Index Cond: (p <@ '(7,4),(2,1)'::box)
        (2 rows)
        Example:point,line,box,circle
    
    sp-gist :
    SP-GiST is an abbreviation for space-partitioned GiST. SP-GiST supports partitioned search trees, which facilitate development of a wide range of different non-balanced data structures, such as quad-trees, k-d trees, and radix trees (tries).
    which support indexed queries using these operators:
        <<
        >>
        ~=
        <@
        <^
        >^
    
    Example::
    ==========
        postgres=# create table points(p point);
        CREATE TABLE
        postgres=# insert into points(p) values
        postgres-#   (point '(1,1)'), (point '(3,2)'), (point '(6,3)'),
        postgres-#   (point '(5,5)'), (point '(7,8)'), (point '(8,6)');
        INSERT 0 6
        postgres=# select amop.amopopr::regoperator, amop.amopstrategy
                        from pg_opclass opc, pg_opfamily opf, pg_am am, pg_amop amop
                            where opc.opcname = 'quad_point_ops'
                        and opf.oid = opc.opcfamily
                            and am.oid = opf.opfmethod
                        and amop.amopfamily = opc.opcfamily
                        and am.amname = 'spgist'
                        and amop.amoplefttype = opc.opcintype;
            
        amopopr     | amopstrategy
        -----------------+--------------
        <<(point,point) |            1
        >>(point,point) |            5
        ~=(point,point) |            6
        <^(point,point) |           10
        >^(point,point) |           11
        <@(poin  t,box)   |            8
        (6 rows)
    
        postgres=# explain (costs off) select * from points where p >^ point '(2,7)';
                QUERY PLAN
        ---------------------------------
        Seq Scan on points
        Filter: (p >^ '(2,7)'::point)
        (2 rows)
    
        postgres=# set enable_seqscan = off;
        SET
        postgres=# explain (costs off) select * from points where p >^ point '(2,7)';
                QUERY PLAN
        ---------------------------------
        Seq Scan on points
        Filter: (p >^ '(2,7)'::point)
        (2 rows)
    
        postgres=# create index points_quad_idx on points using spgist(p);
        CREATE INDEX
        postgres=# explain (costs off) select * from points where p >^ point '(2,7)';
                        QUERY PLAN
        -------------------------------------------------
        Index Only Scan using points_quad_idx on points
        Index Cond: (p >^ '(2,7)'::point)
        (2 rows)