A few days ago I discovered some (to my feeling) buggy behavior during a COMPLETE refresh of a -partitioned- materialized view. It was an unfortunate situation, because it was discovered the day after going into production. It was not noticed during the weeks of development and pre-production execution. Seems like we have some improvement to do there. It had to do with the index statistics on GLOBAL indexes of a partitioned materialized view. The first time our ETL ran with some new materialized views implemented on them, the ETL did its job and COMPLETE refreshed a new MV. So far, so good, that was the intention. However, countries started complaining about huge performance problems with some specific reports and another ETL part (only run on Production, which is one of those things I did not know and we need to improve a next time we test a new implementation) was taking hours and hours where it normally would take a few minutes.
First discovery
We soon found out the ETL and reports problems all had to do with this new materialized view, which was an ‘exact’ replacement of a manually refreshed aggregate table with the exact same indexes and columns. The statistics had also been gathered on the MV. I looked at the execution plans of the badly behaving ETL first and saw that Oracle was using a GLOBAL index defined on the new MV. The cardinality estimates in the execution plan were 1, however, the number of records to be retrieved were in the millions. So it had to be a statistic issue. I checked and saw that all the indexes were last analyzed somewhere during the normal ETL and to be exact: they were analyzed as part of the COMPLETE refresh. At first I wondered: who did an analyze of that table’s indexes during the ETL, is it defined in the ETL somewhere? It took me a few moments to realize that the way we do the COMPLETE refresh is a so called ‘non-atomic’ one:
begin
dbms_mview.REFRESH('SOME_JOIN_MV','COMPLETE',atomic_refresh=>FALSE);
end;
/
This effectively does a COMPLETE refresh by doing a TRUNCATE, followed by a direct path insert and an efficient rebuild of all the indexes. And it was this rebuild of the indexes that lead to the analyze of those indexes: Oracle 10g and above automatically (re-)analyzes the new or rebuilt indexes on a table when this table already has statistics gathered. So OK, I found out where the analyze came from. But why was the cardinality so wrong? I checked and to my surprise the num_rows statistic was set to zero (not NULL) for the GLOBAL indexes only. The partitioned indexes however where analyzed fine: their statistics were accurate. So I went testing this behavior (after first correcting the statistics on those indexes in Production to have performance return to normal) and I saw what Oracle did (among others by taking a SQLtrace during the COMPLETE refresh). And this was the situation on HP-UX Oracle 10.2.0.4 64bit:
- Oracle did a TRUNCATE of the MV
- Oracle set the indexes to unusable AND set the statistics of the GLOBAL indexes to zero!
- Oracle did the Direct Path Insert
- Oracle rebuilt the indexes –> automatic analyze of the LOCAL partitioned indexes, but not of the GLOBAL ones
So after the refresh the statistics on the GLOBAL indexes were zero and the LAST ANALYZE was from the moment after the TRUNCATE, but the LOCAL indexes statistics were fine: they seemed to have gathered normally during the rebuild. To me that sounds like a bug. I cannot explain why this behavior would be like that. And that was the cause of the performance problems: with index statistics set to zero, Oracle considered these GLOBAL indexes as extremely cheap and used them in about every query. But that was completely misleading information. I adjusted the materialized view refresh stored procedure by adding an explicit dbms_stats.gather_index_stats on those GLOBAL indexes and everything went back to normal.
But then…
…I wanted to wrote this post and started retesting this on my Windows XP 64bit 10.2.0.4. The same version (I may be off by a minor patchlevel, I need to check) and the same kind of MV (not the same one, but also a join MV). To my surprise during test Oracle did a good job at the COMPLETE refresh for the GLOBAL indexes: now these were gathered also and to be most specific: as one of the last steps of the refresh. Seems like the thing that was forgotten on the HP-UX version. But now another thing seems wrong: the TABLE level statistics of the LOCAL partitioned indexes are not gathered. The example I prepared was:
DROP TABLE some_fact; DROP TABLE some_domain; CREATE TABLE some_fact AS SELECT * FROM all_objects; CREATE TABLE some_domain AS SELECT * FROM all_tables; CREATE UNIQUE INDEX pk_some_domain ON some_domain (owner,table_name); CREATE MATERIALIZED VIEW some_join_mv PARTITION BY LIST (object_type) ( PARTITION part_table VALUES ('TABLE'), PARTITION part_index VALUES ('INDEX'), PARTITION part_proc VALUES ('PROCEDURE','PACKAGE','FUNCTION'), PARTITION part_remaining VALUES (DEFAULT)) BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND AS SELECT f.object_name , f.object_type , f.owner , f.status , f.last_ddl_time , d.avg_row_len , d.blocks , d.empty_blocks , d.partitioned , d.tablespace_name FROM some_fact f, some_domain d WHERE f.object_name = d.table_name(+) AND f.owner = d.owner(+); BEGIN DBMS_STATS.gather_table_stats (ownname => USER , tabname => 'SOME_JOIN_MV' , estimate_percent => 100 , granularity => 'ALL' , method_opt => 'FOR ALL COLUMNS SIZE 1' , CASCADE => TRUE ); END; / CREATE INDEX idx_some_join_mv_p1 ON some_join_mv (object_name, object_type, owner) GLOBAL; CREATE INDEX idx_some_join_mv_type_local ON some_join_mv (object_type) LOCAL; SELECT index_name , table_name , partition_name , leaf_blocks , distinct_keys , clustering_factor , num_rows , last_analyzed FROM all_ind_statistics WHERE table_name = 'SOME_JOIN_MV' ORDER BY index_name, partition_name NULLS FIRST;
Results were (the TABLE level statistics of the LOCAL partitioned index in Red):
index_name | table_name | partition_name | leaf_blocks | distinct_keys | clustering_factor | num_rows | last_analyzed |
---|---|---|---|---|---|---|---|
IDX_SOME_JOIN_MV_P1 | SOME_JOIN_MV | 589 | 50675 | 23994 | 77427 | 14-3-2009 21:13:36 | |
IDX_SOME_JOIN_MV_TYPE_LOCAL | SOME_JOIN_MV | 242 | 29 | 1061 | 77427 | 14-3-2009 21:13:36 | |
IDX_SOME_JOIN_MV_TYPE_LOCAL | SOME_JOIN_MV | PART_INDEX | 7 | 1 | 20 | 2804 | 14-3-2009 21:13:36 |
IDX_SOME_JOIN_MV_TYPE_LOCAL | SOME_JOIN_MV | PART_PROC | 4 | 3 | 27 | 1234 | 14-3-2009 21:13:36 |
IDX_SOME_JOIN_MV_TYPE_LOCAL | SOME_JOIN_MV | PART_REMAINING | 218 | 24 | 971 | 68115 | 14-3-2009 21:13:36 |
IDX_SOME_JOIN_MV_TYPE_LOCAL | SOME_JOIN_MV | PART_TABLE | 13 | 1 | 43 | 5274 | 14-3-2009 21:13:36 |
Then I deleted almost all records in the Fact table, did a COMPLETE refresh and see what the index statistics were after that:
DELETE FROM some_fact WHERE ROWNUM <= 77000; COMMIT ; BEGIN dbms_mview.REFRESH ('SOME_JOIN_MV', 'COMPLETE', atomic_refresh => FALSE); END; /
See the inaccuracy:
index_name | table_name | partition_name | leaf_blocks | distinct_keys | clustering_factor | num_rows | last_analyzed |
---|---|---|---|---|---|---|---|
IDX_SOME_JOIN_MV_P1 | SOME_JOIN_MV | 3 | 421 | 18 | 427 | 14-3-2009 21:20:50 | |
IDX_SOME_JOIN_MV_TYPE_LOCAL | SOME_JOIN_MV | 242 | 29 | 1061 | 77427 | 14-3-2009 21:13:36 | |
IDX_SOME_JOIN_MV_TYPE_LOCAL | SOME_JOIN_MV | PART_INDEX | 1 | 1 | 1 | 2 | 14-3-2009 21:20:50 |
IDX_SOME_JOIN_MV_TYPE_LOCAL | SOME_JOIN_MV | PART_PROC | 1 | 3 | 1 | 8 | 14-3-2009 21:20:50 |
IDX_SOME_JOIN_MV_TYPE_LOCAL | SOME_JOIN_MV | PART_REMAINING | 1 | 4 | 1 | 18 | 14-3-2009 21:20:50 |
IDX_SOME_JOIN_MV_TYPE_LOCAL | SOME_JOIN_MV | PART_TABLE | 1 | 1 | 4 | 399 | 14-3-2009 21:20:50 |
All indexes have accurate statistics, but the TABLE level statistics on IDX_SOME_JOIN_MV_TYPE_LOCAL are still the old ones.
So I witnessed two strange things in INDEX statistics with COMPLETE refreshed materialized views. Can anyone tell me the reasons…? At least I learned to pay carefull attention to the statistics when dealing with materialized views refreshes, that is for sure!
Did you compare behaviour with an atomic refresh?
The atomic refresh will do a delete rather than a truncate.
Was the non-atomic refresh option chosen deliberately, e.g. for performance?
Hi Dominic, thanks for the comment!
Yes, it is certainly intentional to do the TRUNCATE version (atomic=>FALSE), because of the huge amount of records in the table (73 million currently), the existence of about 15 indexes on the table and the fact that the ETL runs during the night and the TRUNCATE is accepted by everyone (acidental atomic COMPLETE refreshes during tests of other MV’s wre dramatic: extremely long refresh times so if you can do the TRUNCATE version, I would advise it, but be aware of other side effects). I wanted to do it with a FAST refresh, but as it turned out one of the underlying MV definitions has a construction in it that lead to data errors during testing. We still have to find out exactly what happens. Since a COMPLETE refresh (with atomic=>FALSE) was still by far superior to the previous implementation with the manual “refresh” of a table using an ETL tool, we choose the COMPLETE refresh to at least gain about 2-3 hours. As an atomic refresh is purely DML (indeed with a DELETE instead of a TRUNCATE), I expected that there would be no statistics calculation during the refresh and I just tested it: the statistics are completely untouched with an atomic refresh as expected. That is also certainly something to take into account: when there are huge dataloads happening or huge deletes/updates, you may want to gather stats after the atomic refresh to keep them in sync with the real data.
FYI – Oracle acknowledged this as a bug (#5632128). I find your post very helpful though in my initial research into a similar issue we faced here in my company.