Skip to content

Some of the partition Query Plan incorrectly Seq Scan on parent table #164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kenzan100 opened this issue Jun 28, 2018 · 24 comments
Closed

Comments

@kenzan100
Copy link

Problem description

We have a table called customers, and it has about 4K partition.

For some of the newer partitions, EXPLAIN UPDATE customers SET ... WHERE ...
results in Seq Scan on the parent table, making it significantly slower.

# EXPLAIN UPDATE customers SET ... WHERE account_id = 1234 and customer_uuid = ...;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Update on customers  (cost=0.00..2.29 rows=2 width=369)
   Update on customers
   Update on customers_4813
   ->  Seq Scan on customers  (cost=0.00..0.00 rows=1 width=371)
         Filter: ((account_id = 1234) AND (customer_uuid = '....'::uuid))
   ->  Index Scan using customers_4813_customer_uuid_key on customers_4813  (cost=0.28..2.29 rows=1 width=367)
         Index Cond: (customer_uuid = '....'::uuid)
         Filter: (account_id = 1234)
(8 rows)

Time: 31561.109 ms

For other cases, it produces the correct query plan

# EXPLAIN UPDATE customers SET ... WHERE account_id = 1 and customer_uuid = '....';
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Update on customers_5  (cost=0.29..2.31 rows=1 width=1371)
   ->  Index Scan using customers_5_customer_uuid_key on customers_5  (cost=0.29..2.31 rows=1 width=1371)
         Index Cond: (customer_uuid = '....'::uuid)
         Filter: (account_id = 1)
(4 rows)

Time: 0.721 ms

Expected Result

No matter which partition it is, it should produce an optimized query plan.

Environment

select * from pathman_config_params;
         partrel          | enable_parent | auto | init_callback | spawn_using_bgw
--------------------------+---------------+------+---------------+-----------------
 customers                | f             | t    |               | f
      extname       | extowner | extnamespace | extrelocatable | extversion |   extconfig   | extcondition
--------------------+----------+--------------+----------------+------------+---------------+--------------
 plpgsql            |       10 |           11 | f              | 1.0        |               |
 btree_gist         |       10 |         2200 | t              | 1.2        |               |
 pg_stat_statements |       10 |         2200 | t              | 1.4        |               |
 pg_pathman         |       10 |         2200 | f              | 1.4        | {16966,16977} | {"",""}
                                               version
--------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.4 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4, 64-bit
(1 row)
get_pathman_lib_version
-------------------------
 10402
(1 row)
@funbringer
Copy link
Collaborator

Hi @kenzan100,

First of all, thanks for using pg_pathman. I see that your build of pg_pathman is out of date. Could you install the fresh release and retry?

@funbringer
Copy link
Collaborator

funbringer commented Jun 29, 2018

I'd like to notice that the slowdown isn't caused by SeqScan on parent. Presumably, the real reason is that optimizations are turned off for some of your queries, which is why standard partition pruning (so-called constraint elimination mechanism) takes place and adds parent table to the plan.

@kenzan100
Copy link
Author

@funbringer

thank you for your quick reachout.

I see that your build of pg_pathman is out of date. Could you install the fresh release and retry?

We followed your advice, and now it's

 get_pathman_lib_version
-------------------------
 1.4.12

but still, the query planner returns the same result.

@funbringer
Copy link
Collaborator

Ok, could you show the contents of pathman_config? I'd like to know the partitioning expression. Also, can you reproduce slow queries consistently (the same query is slow N times in a row)? Is it the same query that's slow? If so, could you show the list of partitions and their bounds (if it's range partitioning) that should be updated by those queries?

@jamatthews
Copy link

partrel          |    expr    | parttype | range_interval |                                                      cooked_expr                                                      
--------------------------+------------+----------+----------------+-----------------------------------------------------------------------------------------------------------------------
customers                | account_id |        2 | 1              | {VAR :varno 1 :varattno 3 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 3 :location 8

We use a range of 2 so each partition covers only one value. We can consistently reproduce older partitions having fast pathman planning and the new partitions having normal PG planning. It only seems to happen with one table and two other tables partitioned the same are unaffected.

@funbringer
Copy link
Collaborator

funbringer commented Jun 29, 2018

@jamatthews,

I see. Just to make sure this isn't a cache bug: could you execute the following in the same session (e.g. in psql):

  • a good query,
  • a bad (slow) query,
  • select * from pathman_partition_list where partition = $NAME::regclass where $NAME is the name of a new partition.

The last one would allow us to see if the cache is working correctly. I suspect that we might not see new partitions.

@kenzan100
Copy link
Author

@funbringer

I followed

a good query,
a bad (slow) query,
select * from pathman_partition_list where partition = $NAME::regclass where $NAME is the name of a new partition.

and the last SELECT does show the new partition.

cc: @jamatthews

@funbringer
Copy link
Collaborator

@kenzan100

And during this test the bad one was still slow, right?

@kenzan100
Copy link
Author

@funbringer yes

@funbringer
Copy link
Collaborator

Ok, have you ever dropped any columns before creating new partitions?

@kenzan100
Copy link
Author

@funbringer yes, we have dropped a column for this table recently.

@funbringer
Copy link
Collaborator

Finally, now it makes more sense. I guess that's the reason. You see, at the moment this optimization doesn't work if the number of attributes in partition and parent doesn't match. It's checked here (see the code).

Here's how it looks like:

create table test (a int, b int not null);
select create_range_partitions('test', 'b', 0, 100, 3);

select count(*) from pg_attribute where attrelid = 'test'::regclass;
 count 
-------
     8
(1 row)

select count(*) from pg_attribute where attrelid = 'test_1'::regclass;
 count 
-------
     8
(1 row)

/* drop the column and spawn a new partition */
alter table test drop column a;
select prepend_range_partition('test');

select count(*) from pg_attribute where attrelid = 'test_4'::regclass;
 count 
-------
     7
(1 row)

explain update test set b = 5 where b = 50;
                          QUERY PLAN                           
---------------------------------------------------------------
 Update on test_1  (cost=0.00..41.88 rows=13 width=14)
   ->  Seq Scan on test_1  (cost=0.00..41.88 rows=13 width=14)
         Filter: (b = 50)
(3 rows)

explain update test set b = 5 where b = -50;
                          QUERY PLAN                           
---------------------------------------------------------------
 Update on test  (cost=0.00..41.88 rows=14 width=10)
   Update on test
   Update on test_4
   ->  Seq Scan on test  (cost=0.00..0.00 rows=1 width=14)
         Filter: (b = '-50'::integer)
   ->  Seq Scan on test_4  (cost=0.00..41.88 rows=13 width=10)
         Filter: (b = '-50'::integer)
(7 rows)

However, this is something we definitely can fix. I'm going to push a bugfix shortly.

Thank you for the report.

@kenzan100
Copy link
Author

ohhh, nice input! that's awesome to know.
We'll confirm if we can reproduce the same query performance observation in another environment.
At the same time, we're looking forward to that version bump!!

@kenzan100
Copy link
Author

I'm going to push a bugfix shortly.

@funbringer when do you think this will happen? We would like to apply this fix, preferably really soon.

@funbringer
Copy link
Collaborator

I've almost finished the patch, but you have to wait a little longer (a couple of days, probably).

@kenzan100
Copy link
Author

kenzan100 commented Jun 29, 2018 via email

@kenzan100
Copy link
Author

@funbringer I'm looking forward to hearing from you and the team! sorry to bother you many times...

@funbringer
Copy link
Collaborator

funbringer commented Jul 2, 2018

Hi,

I've pushed a branch called master_quickfix_164, you're welcome to try it. Be wary of possible bugs, though. I'm going to rewrite a few more things before pushing this to master.

Just make && make install, no need to drop/create/update extension.

@kenzan100
Copy link
Author

@funbringer thanks for the quick updates!
we'll certainly try and report back

@kenzan100
Copy link
Author

@funbringer Right now, it's going really well, that all the partitions are having optimized query plans!!
looking forward to the version bump

@funbringer
Copy link
Collaborator

Hi again,

Could you test the master branch? If it works, I'm going to roll out a new release

@kenzan100
Copy link
Author

kenzan100 commented Jul 5, 2018

@funbringer We are now using 79a1b89
looking good so far!

@funbringer
Copy link
Collaborator

1.4.13 is now available.

@kenzan100
Copy link
Author

cool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants