My current project is working for a client consolidating a bunch of SQL server databases into a properly licenced and managed sql serve cluster, Its been a while since I did a lot in SQL Server and it seemed like a good way to refresh some skills I hadn’t used for while. One of the big differences is sql server best practices recommends running a maintenance job to rebuild fragmented indexes, current thinking in Oracle is that frequent (and most of the time any) index rebuilds are not required. Index fragmentation in SQL Server has a two specific meanings Internal fragmentation is wasted space in an index the kind of thing you get after heavy deletes of almost all the data in a range of values, the other is external fragmentation where pages are stored out of order on disk (a more in depth explanation is here http://www.sql-server-performance.com/articles/per/index_fragmentation_p1.aspx) most of the discussion about oracle indexes discusses the first type of fragmentation but I’ve seen the second type discussed far less so I figured I’d see if the same thing happens. first of all to build some test tables and indexes.
drop table TEST_FRAG;
create table TEST_FRAG (id number not null, value varchar2(10));
insert into TEST_FRAG select rownum*2-1, ‘Stuff’ from dual connect by level <=200000;
create unique index TEST_FRAG_IDX on TEST_FRAG(id);
insert into TEST_FRAG select rownum*2, ‘TEST_FRAG’ from dual connect by level <=200000;
drop table TEST_UNFRAG;
create table TEST_UNFRAG (id number not null, value varchar2(10));
insert into TEST_UNFRAG select rownum, ‘Stuff’ from dual connect by level <=400000;
create unique index TEST_UNFRAG_IDX on TEST_UNFRAG(id);
commit;
begin
DBMS_STATS.GATHER_TABLE_STATS (ownname => ‘CHRIS’,tabname => ‘TEST_FRAG’,estimate_percent => 100,method_opt=> ‘FOR ALL COLUMNS SIZE 1’);
DBMS_STATS.GATHER_TABLE_STATS (ownname => ‘CHRIS’,tabname => ‘TEST_UNFRAG’,estimate_percent => 100,method_opt=> ‘FOR ALL COLUMNS SIZE 1’);
end;
This Creates a couple of simple tables and indexes which we can then run some tests on, first of all this:-
alter system flush buffer_cache;
alter session set events ‘10046 trace name context forever, level 8′;
select count(id) from test_frag where id >=1000 and id <=20000;
select count(id) from test_unfrag where id >=1000 and id <=20000;
both take next to no time, taking a look at the traces we can see a small difference, for the first statement we get the following wait events in the trace file.
WAIT #4: nam=’db file sequential read’ ela= 58 file#=4 block#=19812 blocks=1 obj#=67986 tim=1236296568417821
WAIT #4: nam=’db file sequential read’ ela= 57 file#=4 block#=18928 blocks=1 obj#=67986 tim=1236296568418122
WAIT #4: nam=’db file sequential read’ ela= 59 file#=4 block#=19816 blocks=1 obj#=67986 tim=1236296568418396
WAIT #4: nam=’db file sequential read’ ela= 57 file#=4 block#=18929 blocks=1 obj#=67986 tim=1236296568418700
WAIT #4: nam=’db file sequential read’ ela= 60 file#=4 block#=19820 blocks=1 obj#=67986 tim=1236296568418972
WAIT #4: nam=’db file sequential read’ ela= 56 file#=4 block#=18930 blocks=1 obj#=67986 tim=1236296568419259
We can see the range scan is reading blocks from different extents as it progresses through the index for the second table the waits look like this,
WAIT #3: nam=’db file sequential read’ ela= 46 file#=4 block#=20847 blocks=1 obj#=67988 tim=1236296568430710
WAIT #3: nam=’db file sequential read’ ela= 48 file#=4 block#=20848 blocks=1 obj#=67988 tim=1236296568430986
WAIT #3: nam=’db file sequential read’ ela= 58 file#=4 block#=20849 blocks=1 obj#=67988 tim=1236296568431264
WAIT #3: nam=’db file sequential read’ ela= 58 file#=4 block#=20850 blocks=1 obj#=67988 tim=1236296568431545
WAIT #3: nam=’db file sequential read’ ela= 56 file#=4 block#=20851 blocks=1 obj#=67988 tim=1236296568431803
WAIT #3: nam=’db file sequential read’ ela= 56 file#=4 block#=20852 blocks=1 obj#=67988 tim=1236296568432065
Here we are scanning the blocks in order, overall however the total work done is much the same (I got a total of 44 blocks read from the “Fragmented” index and 42 from the “unfragmented”) and we are not seeing a huge difference in access time as the elapsed time is pretty much the same. the next test gives some different results.
alter system flush buffer_cache;
alter session set events ‘10046 trace name context forever, level 8’;
select count(value) from test_frag where id >=1000 and id <=20000;
select count(value) from test_unfrag where id >=1000 and id <=20000;
Looking in the trace file here shows the first query actually has a completely different execution plan, it does this
Rows Row Source Operation
——- —————————————————
1 SORT AGGREGATE (cr=1005 pr=1001 pw=1001 time=0 us)
19001 TABLE ACCESS FULL TEST_FRAG (cr=1005 pr=1001 pw=1001 time=509 us cost=277 size=228024 card=19002)
where the second query does this
Rows Row Source Operation
——- —————————————————
1 SORT AGGREGATE (cr=83 pr=83 pw=83 time=0 us)
19001 TABLE ACCESS BY INDEX ROWID TEST_UNFRAG (cr=83 pr=83 pw=83 time=1460 us cost=84 size=190020 card=19002)
19001 INDEX RANGE SCAN TEST_UNFRAG_IDX (cr=42 pr=42 pw=42 time=497 us cost=42 size=0 card=19002)(object id 67988)
Why is this? well the answer lies in the clustering factor because of the way the data was loaded the clustering factor for the first table =400,000 and for the second = 880 (the second index is slightly smaller as well) this has nothing to do with the placement of the index blocks on disk and everything to do with the slightly bizarre way the data was loaded, rebuilding the index doesn’t change the clustering factor so won’t alter the execution plan unless we manually manipulate the statistics or use hints to force a range scan.
So do we have a similar type of fragmentation to the SQL server definition of External, well we appear too however the impact is not particularly large also in this case we could avoid it by creating the index with a pctfree of 50 and the organisation of the table data is actually the larger factor here we can’t fix the clustering factor by rebuilding the index only by changing the way we load the data.