Making life hard for yourself.

April 21, 2013

Sometimes it seems like a developer has gone out of their way to make life difficult for themselves (and us poor long suffering DBAs). For example we have an audit trail of user actions in the database that contains xml. This almost makes sense except that the xml is stored in a blob, something like this for example….

3C446F63756D656E743E0D0A20203C6669656C64204E616D653D2249223E444F3C2
F6669656C643E0D0A20203C6669656C64204E616D653D224E4F54223E4C494B453C
2F6669656C643E0D0A20203C6669656C64204E616D653D2244415441223E5045527
6593356745A5735305067304B494341385A6D6C6C62475167546D46745A54306951
6B564A546B6369506B466962475567644738676358566C636E6B6762586B675A474
630595477765A6D6C6C6247512B44516F384C305276593356745A57353050673D3D
3C2F66696C65643E0D0A3C2F446F63756D656E743E

As we have limited space these records are removed after 90 days but can be restored from the backups from up to ten years ago if required, however getting the data back is only part of the problem as we now need to read it. Being a helpful sort of chap I wrote a simple blob to clob function that allows us to get the xml out of the blob, what we get for the above is this:-

<Document>
<field Name="I">DO</field>
<field Name="NOT">LIKE</field>
<field Name="DATA">PERvY3VtZW50Pg0KICA8ZmllbGQgTmFtZT0iQkVJTkciPkFibGUgdG8gcXVlcnkgbXkgZGF0YTwvZmllbGQ+DQo8L0RvY3VtZW50Pg==</filed>
</Document>

So what is that DATA field in the middle?
Its a base64 encoded xml document, at this point I gave up and advised the developer that they would have to sort this out themselves as we have an xml document encoded into base64, put in a field in another xml document and then the whole thing is stored in a blob field. Attempting to get meaning out of this data via sql is going to be difficult and frustrating when we know exactly which record is required if someone wants to search the xml for a specific field then we are heading down the rabbit hole into madness especially as we have only attempted to decode a couple of entries so who knows what else is hiding in there.
So instead of shipping this data into the data warehouse we a stuck with backing it up onto tape and restoring it on request along with the front end tool that can read and display it, hopefully the development team won’t mind supporting ten years worth of front ends as well.


Remember to check the obvious.

December 14, 2012

Its easy to forget to check the obvious sometimes, a user complains that performance of a RAC cluster is appalling, a quick dive into OEm shows everything is spending forever in cluster waits, its easy to dive into sql performance tuning mode. Alternatively you can start looking for hot blocks or small tables being pinged back and forth across the interconnect. Luckily I noticed something odd in SAR first, my interconnect only transferring 1.2mb/s but everything still hung up on cluster waits a quick check of the network card:-

racnode1:~ # ethtool eth4
Settings for eth4:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 10Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: umbg
        Wake-on: g
        Current message level: 0x00000003 (3)

Link detected: yes

10mb/s? so much for the fancy switch management and alerting software, failing the bond to the other network link fixed things until a networking engineer can get to it and move the cable.

Sometimes its not a complex sql tuning task sometimes it is a database performance issue and not the fault of the developers.


Bind Variable mismatch, nulls and oci

November 24, 2011

A short post on a frustrating issue that took a while to find the solution to. The developers made a small change to a table allowing nulls in 23 columns previously configured as not null. This resulted in a large number of bind mismatches and child cursors being generated, eventually giving rise to a serious performance issue.

After much discussion of the issue test cases, being told to tune the database flushing the shared pool finally a developer opened the code and took a look at the problem and found the reason for the bind mismatch was the oci null indicator, when the column value is set null a value of -1 was being passed, when a value was present a null pointer was passed. The optimiser sees these as bind mismatches giving 2^23 combinations of bind settings. Patching the code to pass a zero when a value is present rather than a null pointer prevents the bind mismatch and returned everything to normal.

One other issue found was the test configuration, due to the data distribution in test not generating the same variance in nulls/values in the binds this problem wasn’t found until the production release making the assorted workarounds necessary.


How expensive are literals?

June 26, 2011

As a follow-up to my last post I finally got around to writing a small program to demonstrate the difference between bind variables and literals, the test is a simple c++ program the reads a text file and updates 27,000 rows in a single table. The code runs the updates in three different ways.

Method 1: Bulk bind 200 rows at a time into the update statement.

Method 2: Bind 1 row at a time

Method 3: Use literals in the updates.

Tkprof output from the test looks like this.

Method 1
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.01       0.01          0          0          0           0
Execute    271      5.44       6.05         23      28661      27706       27001
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      273      5.46       6.07         23      28661      27706       27001
 
Method 2
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute  27001     12.19      14.18          0      54015      27826       27001
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    27003     12.19      14.18          0      54015      27826       27001
 
Method 3
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    27001     19.06      25.33          0          0          0           0
Execute  27001     54.99      66.41          0      54002      27706       27001
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    54002     74.06      91.75          0      54002      27706       27001

From this simple test the literals uses 13.5 times as much CPU compared to the first method in cost terms if I need 2 x86 cores of CPU power to power my database for method 1 I need 27 cores for method 3 (assuming it would scale) in cost terms thats 1*£31,839 vs 14*£31,839 or £445,746. aside from the cost there is no way to improve the speed of method 3 and the code is harder to write as well.


Bind Variables, almost right.

June 1, 2011

Sometimes I get a bit sarcastic with and about developers, this is unfair as for the most part the ones I have worked closely have either been very good oracle developers or very good c++/java/VB6/whatever developers with a real interest in doing things the right way and have been receptive to constructive (but grumpy feedback).

This gem in a vendors application however brought out the sarcasm.

Update
  bigwidetable
set
  column_1 = :1, column2 = :2, ..............
  column_125 = :125, column_126 = :126
where
   column_127 = 'GotBoredWithBindVariablesDidWe?';

I can’t work the logic out here, the code is already written to use bind variables it took extra code to stick a literal in there, as the application is written in c using the oci libraries these guys are supposed to be rocket surgeons. There isn’t even a justification based on skewed data column_127 is the primary key so every update updates one row and there is only one plan ever used. Added to that the design of the process is such that the application caches updates then send in 400 at a time , committing after each update just to make sure performance really suffers except for end of day processing where all 27,000 rows are updated at least there is only a single commit during that specific piece.

So far so bad, however the worst impact we see is because we are using Automatic Memory management the shared pool grows larger over time as it tries to cache all the execution plans. After three weeks we get to the point of other processes suffering form either not enough buffers or increases in mutex and library cache latch waits, the solution? restart the database every two weeks and the spfile to reset the values of __db_cache_size and __shared_pool_size. Yes we could use manual memory management but the vendors application compute guide specifies AMM changing to manual memory management is still being assessed by their performance team.


SQL Injection

February 22, 2011

Sometimes I wonder why I still find stuff like this, there are so many articles/blogs/documents about sql injection how does a developer come up with the following…

create or replace function row_count (p_table varchar2)
return number
is
v_str VARCHAR2(1000);
v_row NUMBER;
BEGIN
v_str := ‘select count(*) from ‘||p_table;
execute immediate v_str into v_row;
return v_row;
end;
/

My simple explanation of why this is bad was that passing in the parameter ‘all_objects,all_objects,all_objects’ will result in a fairly nasty cartesian join and an attacker who knew the names of the database tables could potentially pick a really large table. Run enough copies and it starts to look like a denial of service attack, I’m pretty sure someone smarter than me could come up with something worse but I would rather not have to think how to defend from sql injection when simply doing it right (with technology from the 90’s possibly 80’s) is so much easier.


A single procedure to “hang” the database.

February 14, 2011

I really enjoy it when people don’t understand the features they are using it makes life really interesting, automatic parallelism is a great new feature of 11.2 but we found a new and interesting way to break things. As part of the application we are implementing customers may re-request fairly large volumes of data to limit the impact on the system as a whole the design is such that a cursor is opened and the data returned in chunks of between  500 rows, if two customers execute re-requests at the same time then two cursors are opened cutomer1 gets 500 rows then customer2 gets 500 rows then back to customer1 and so on. If ten customers issue re-requests results are sent back in chunks of 500 to each customer in turn and so on.

The intention is to prevent customers complaining that they are waiting too long to receive data at the same time as preventing re-requests flooding the database and making the other critical parts of the system becoming unresponsive.

Nice Idea but someone didn’t read enough about automatic parallelism, here is a simple demo, first create a table e.g. (this code has been shamelessly adapted from Jonathan Lewis’ site)

create table t1 
nologging 
 as
 with generator as ( 
 select  --+ materialize 
 rownum id 
 from dual 
 connect by
 rownum <=  10000 
 ) 
 select
 rownum                   id, 
 trunc(dbms_random.value(1,1000))    n1, 
 trunc(dbms_random.value(1,1000))     n2, 
 lpad(rownum,30,'0')          small_vc 
 from
 generator    v1, 
 generator   v2 
 where
 rownum <= 100000000;
/
 
 begin
 dbms_stats.gather_table_stats( 
 ownname      => user, 
 tabname      =>'T1', 
 method_opt    => 'for all columns size 1'
 ); 
 end; 
/

Then a nice simple piece of pl/sql to demonstrate the issue (database version 11.2.0.1, 8 cores, parallel_server_target set to 64, parallel_min_time_threshold=1 you may need different values to force the behaviour)

declare
cursor cursor1 is select * from t1  order by id;
cursor cursor2 is select * from t1  order by id;
cursor cursor3 is select * from t1  order by id;
cursor cursor4 is select * from t1  order by id;
cursor cursor5 is select * from t1  order by id;
cursor cursor6 is select * from t1  order by id;
cursor cursor7 is select * from t1  order by id;
cursor cursor8 is select * from t1  order by id;
cursor cursor9 is select * from t1  order by id;
cursor cursor10 is select * from t1  order by id;
id dbms_sql.number_table;
n1 dbms_sql.number_table;
n2 dbms_sql.number_table;
small_vc dbms_sql.varchar2_table;
BEGIN
open cursor1;
open cursor2;
open cursor3;
open cursor4;
open cursor5;
open cursor6;
open cursor7;
open cursor8;
open cursor9;
open cursor10;

loop
fetch cursor1 bulk collect into id,n1,n2,small_vc limit 500;
fetch cursor2 bulk collect into id,n1,n2,small_vc limit 500;
fetch cursor3 bulk collect into id,n1,n2,small_vc limit 500;
fetch cursor4 bulk collect into id,n1,n2,small_vc limit 500;
fetch cursor5 bulk collect into id,n1,n2,small_vc limit 500;
fetch cursor6 bulk collect into id,n1,n2,small_vc limit 500;
fetch cursor7 bulk collect into id,n1,n2,small_vc limit 500;
fetch cursor8 bulk collect into id,n1,n2,small_vc limit 500;
fetch cursor9 bulk collect into id,n1,n2,small_vc limit 500;
fetch cursor10 bulk collect into id,n1,n2,small_vc limit 500;
exit when cursor1%notfound;
end loop;

close cursor1;
close cursor2;
close cursor3;
close cursor4;
close cursor5;
close cursor6;
close cursor7;
close cursor8;
close cursor9;
close cursor10;
End;
/

Now execute the stored procedure and wait, and wait, and continue waiting forever…..

The problem is simple(once we knew what was going on), auto parallelism will queue statements if it can’t execute with the degree of parallelism requested by about cursor 8 our database was unable to assign the required parallel slaves so it queued the statement, this means no one else ever gets any rows untill we kill the process. Typically what we were seeing was one session waiting on “PX Queuing: statement queue” event. unfortunately getting the code fixed is going to be a long slow process so at the moment automatic parallelism and statement queuing are switched off.


Flashback logging only has a 2% performance decrease…or not.

February 17, 2010

Firstly apologies to both people who have previously read my blog, things have been a little hectic over the past 12 months working away from home in the Netherlands has taken its toll on my spare time, and weekends are to be spent either with the family or working, things have settled into a more stable pattern now so here goes..

Some background, I’ve been assigned into a project to configure dataguard on a very busy production system performing a mixed bag of batch processing, messaging and DSS loads. Performance and availability of the system is critical for the business so everyone was happy that when flashback was proposed as an addition to the system they could quote this bit of marketing http://www.oracle.com/technology/deploy/availability/htdocs/Flashback_Overview.htm (search for 2% and you will see the bit I mean).  Great everyone was happy then i get moved onto the project to help with configuration and testing and start upsetting people..

Prior to starting the formal tests we have a fairly informal period where we have built the test databases configured the standby got the team used to data guard brokers foibles, failed over, rebuilt the primary using flash back database, performed some switch overs and switch backs, broken the configuration by an ill advised flashback on the primary fixed the configuration switched over a couple more times, then hit a bug (5937899) handily unpublished regarding flashback data guard and switch overs fortunately this was recoverable as we had set guaranteed restore points which is where our problems really got started.

As part of the test plan we have a set of processes to run, the plan is to run everything, measure performance of the network, host o/s, SAN generate awr reports etc and then flashback to our guaranteed restore point alter the configuration and re-run the same processes, giving us data we can compare e.g. no data guard, archive log shipping, lgwr async and finally lgwr sync. The plan was good right up until we tried some Micky Mouse testing with simple scripts creating tables by selecting from dba objects, delete insert commit really simple stuff, so first run through no data guard no flashback everything is good, enable  data guard configure flashback database set a guaranteed restore point, performance nosedives and top 5 wait events looks like

Top 5 Timed Events

Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
log file sync 2,055 557 271 35.7 Commit
CPU time 497 31.8
db file parallel write 33,547 437 13 28.0 System I/O
flashback buf free by RVWR 430 402 935 25.7 Other
log file parallel write 777 96 124 6.2 System I/O

This is not good, the log file sync time isn’t unexpected (in fact is not much greater than without dataguard or flashback on), but the flashback buf free by RVWR is not something we expected to see at this level.  Finding this really doesn’t bode well for the rest of the testing and the impact on production, so we start digging around, switch of flash back, configure a restore point with flashback logging not enabled. Until eventually I’m looking at the session wait events and the flash recovery area at the same time, and notice that all the sessions go into waiting on flashback buf free by RVWR at the same time and this coincides with a new flashback log being created. because of the load we have generated flashback logs are created ~500MB in size, every session we are running to generate load waits while this is happening.

Right so what is the real problem? well its the way we have set things up, with flash back logging enabled and no restore points configured flash back logs will get reused (how soon depends on db_flashback_retention_target) once the system hits a steady state it shouldn’t be creating new flashback logs just reusing old ones. In our test environment we enable flashback logging and configure a guaranteed restore point so we don’t ever reuse a flashback log we just create new ones this isn’t representative of the live system as we will not be setting guaranteed restore points, so our next step is try to find a way to enable flashback to a guaranteed point in test without inflicting the waits we have seen in our quick and dirty testing. testing on a small system the following seems to get around the problem.

Configure the test database and standby server.
Set flashback database on, set db_flashback_retention_target to a big number (even the default of 1440 will work)
execute one of our scripts to generate a lot of changes and wait until we have enough flash back logs (yes i know how many is enough?)
set a guaranteed restore point
set db_flashback_retention_target to a small number (<60)

what will then happen is the existing flashback logs will be reused removing the extreme waits we are getting waiting on new flashback log files being created while still allowing us to flashback and re-run the production like process tests against the same data.

The real point here isn’t about flash back logging or a specific process its about testing in our case we have delayed the start of the formal tests by a day but have saved time overall as we will not end up having to re-run tests that take 4-5 hours to complete, well at least not for this reason :),  in summary :-

validating your environment before you start time consuming and possibly expensive tests is a good idea

Small fairly crude tests can show up problems in the configuration shortening the test cycle.

Having time to mess about and quickly retry things is invaluable for DBA’s implementing features that haven’t been used by the team before.

Never, ever quote a performance figure pulled from marketing material no matter what the source is, always verify in your own environment.


Index Fragmentation – more thoughts.

March 18, 2009

Looking at my last post I’m somewhat unsatisfied about the script to create the “fragmented” index so I thought I’d have another go, modifying it to be like this:

drop table TEST_FRAG2;
create table TEST_FRAG2 (id number not null, value varchar2(10));
insert into TEST_FRAG2 select id,’stuff’ from
(select rownum as id from dual connect by level <=400000)
where mod(trunc(id/400),2)=0;
create unique index TEST_FRAG2_IDX on TEST_FRAG2(id);
insert into TEST_FRAG2 select id,’stuff’ from
(select rownum as id from dual connect by level <=400000)
where mod(trunc(id/400),2)=1;
commit;

begin
DBMS_STATS.GATHER_TABLE_STATS (ownname => ‘CHRIS’,tabname => ‘TEST_FRAG2’,
estimate_percent => 100,method_opt=> ‘FOR ALL COLUMNS SIZE 1’);
end;

This creates an index which has a similar type of physical fragmentation as the pervious example but a better clustering factor, running the first test again. (I’m seeing a faster response from disk today, however I wouldn’t read too muchinto that as my test database is 11g , Oracle E-Linux 5.2 on Oracle VM on the cheapest dell server I could get with one disk and a bunch of third party ram).

alter system flush buffer_cache;
alter session set events ‘10046 trace name context forever, level 8’;
select count(id) from test_frag2 where id >=1000 and id <=20000;
alter session set events ‘10046 trace name context off’;

Taking a look at the trace file shows the following (this is a small sample) type of activity we are reading blocks that are not sequentially placed on disk

WAIT #11: nam=’db file sequential read’ ela= 42 file#=4 block#=25070 blocks=1 obj#=68308 tim=1236636891337307
WAIT #11: nam=’db file sequential read’ ela= 27 file#=4 block#=25956 blocks=1 obj#=68308 tim=1236636891337422
WAIT #11: nam=’db file sequential read’ ela= 27 file#=4 block#=25960 blocks=1 obj#=68308 tim=1236636891337539
WAIT #11: nam=’db file sequential read’ ela= 26 file#=4 block#=25071 blocks=1 obj#=68308 tim=1236636891337667
WAIT #11: nam=’db file sequential read’ ela= 29 file#=4 block#=25968 blocks=1 obj#=68308 tim=1236636891337783
WAIT #11: nam=’db file sequential read’ ela= 26 file#=4 block#=25964 blocks=1 obj#=68308 tim=1236636891337911
WAIT #11: nam=’db file sequential read’ ela= 26 file#=4 block#=25072 blocks=1 obj#=68308 tim=1236636891338034
WAIT #11: nam=’db file sequential read’ ela= 28 file#=4 block#=25976 blocks=1 obj#=68308 tim=1236636891338151

then runing the second test

alter system flush buffer_cache;
alter session set events ‘10046 trace name context forever, level 8’;
select count(value) from test_frag2 where id >=1000 and id <=20000;
select count(value) from test_unfrag where id >=1000 and id <=20000;
alter session set events ‘10046 trace name context off’;

now uses the same execution plan as before, looking at the tkproffed output shows the query on test_frag2 doing more work.

select count(value)
from
test_frag2 where id >=1000 and id <=20000

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.01       0.01        108        155          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.01       0.01        108        155          0           1

select count(value)
from
test_unfrag where id >=1000 and id <=20000

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.01       0.01         83         83          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.01       0.01         83         83          0           1

A coalesce on the index TEST_FRAG2_IDX improves things but still we are doing slightly more work, as shown below, a rebuild gives exactly the same result.

select count(value)
from
test_frag2 where id >=1000 and id <=20000

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.02       0.02         85        132          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.02       0.02         85        132          0           1

I’ll need to take a look at whats going on here but I suspect the extra work is because the table data is scattered accross more blocks rather than anything to do with the index structure


Index fragmentation – Just like SQL Server

March 6, 2009

My current project is working for a client consolidating a bunch of SQL server databases into a properly licenced and managed sql serve cluster, Its been a while since I did a lot in SQL Server and it seemed like a good way to refresh some skills I hadn’t used for  while. One of the big differences is sql server best practices recommends running a maintenance job to rebuild fragmented indexes, current thinking in Oracle is that frequent (and most of the time any) index rebuilds are not required. Index fragmentation in SQL Server has a two specific meanings Internal fragmentation is wasted space in an index the kind of thing you get after heavy deletes of almost all the data in a range of values, the other is external fragmentation where pages are stored out of order on disk  (a more in depth explanation is here http://www.sql-server-performance.com/articles/per/index_fragmentation_p1.aspx) most of the discussion about oracle indexes discusses the first type of fragmentation but I’ve seen the second type discussed far less so I figured I’d see if the same thing happens. first of all to build some test tables and indexes.

drop table TEST_FRAG;
create table TEST_FRAG (id number not null, value varchar2(10));
insert into TEST_FRAG select rownum*2-1, ‘Stuff’ from dual connect by level <=200000;
create unique index TEST_FRAG_IDX on TEST_FRAG(id);
insert into TEST_FRAG select rownum*2, ‘TEST_FRAG’ from dual connect by level <=200000;

drop table TEST_UNFRAG;
create table TEST_UNFRAG (id number not null, value varchar2(10));
insert into TEST_UNFRAG select rownum, ‘Stuff’ from dual connect by level <=400000;
create unique index TEST_UNFRAG_IDX on TEST_UNFRAG(id);
commit;

begin
DBMS_STATS.GATHER_TABLE_STATS (ownname => ‘CHRIS’,tabname => ‘TEST_FRAG’,estimate_percent => 100,method_opt=> ‘FOR ALL COLUMNS SIZE 1’);
DBMS_STATS.GATHER_TABLE_STATS (ownname => ‘CHRIS’,tabname => ‘TEST_UNFRAG’,estimate_percent => 100,method_opt=> ‘FOR ALL COLUMNS SIZE 1’);
end;

This Creates a couple of simple tables and indexes which we can then run some tests on, first of all this:-

alter system flush buffer_cache;
alter session set events ‘10046 trace name context forever, level 8′;
select count(id) from test_frag where id >=1000 and id <=20000;
select count(id) from test_unfrag where id >=1000 and id <=20000;

both take next to no time, taking a look at the traces we can see a small difference, for the first statement we get the following wait events in the trace file.

WAIT #4: nam=’db file sequential read’ ela= 58 file#=4 block#=19812 blocks=1 obj#=67986 tim=1236296568417821
WAIT #4: nam=’db file sequential read’ ela= 57 file#=4 block#=18928 blocks=1 obj#=67986 tim=1236296568418122
WAIT #4: nam=’db file sequential read’ ela= 59 file#=4 block#=19816 blocks=1 obj#=67986 tim=1236296568418396
WAIT #4: nam=’db file sequential read’ ela= 57 file#=4 block#=18929 blocks=1 obj#=67986 tim=1236296568418700
WAIT #4: nam=’db file sequential read’ ela= 60 file#=4 block#=19820 blocks=1 obj#=67986 tim=1236296568418972
WAIT #4: nam=’db file sequential read’ ela= 56 file#=4 block#=18930 blocks=1 obj#=67986 tim=1236296568419259

We can see the range scan is reading blocks from different extents as it progresses through the index for the second table the waits look like this,

WAIT #3: nam=’db file sequential read’ ela= 46 file#=4 block#=20847 blocks=1 obj#=67988 tim=1236296568430710
WAIT #3: nam=’db file sequential read’ ela= 48 file#=4 block#=20848 blocks=1 obj#=67988 tim=1236296568430986
WAIT #3: nam=’db file sequential read’ ela= 58 file#=4 block#=20849 blocks=1 obj#=67988 tim=1236296568431264
WAIT #3: nam=’db file sequential read’ ela= 58 file#=4 block#=20850 blocks=1 obj#=67988 tim=1236296568431545
WAIT #3: nam=’db file sequential read’ ela= 56 file#=4 block#=20851 blocks=1 obj#=67988 tim=1236296568431803
WAIT #3: nam=’db file sequential read’ ela= 56 file#=4 block#=20852 blocks=1 obj#=67988 tim=1236296568432065

Here we are scanning the blocks in order, overall however the total work done is much the same (I got a total of 44 blocks read from the “Fragmented” index and 42 from the “unfragmented”) and we are not seeing a huge difference in access time as the elapsed time is pretty much the same. the next test gives some different results.

alter system flush buffer_cache;
alter session set events ‘10046 trace name context forever, level 8’;
select count(value) from test_frag where id >=1000 and id <=20000;
select count(value) from test_unfrag where id >=1000 and id <=20000;

Looking in the trace file here shows the first query actually has a completely different execution plan, it does this

Rows     Row Source Operation
——-  —————————————————
1  SORT AGGREGATE (cr=1005 pr=1001 pw=1001 time=0 us)
19001   TABLE ACCESS FULL TEST_FRAG (cr=1005 pr=1001 pw=1001 time=509 us cost=277 size=228024 card=19002)

where the second query does this

Rows     Row Source Operation
——-  —————————————————
1  SORT AGGREGATE (cr=83 pr=83 pw=83 time=0 us)
19001   TABLE ACCESS BY INDEX ROWID TEST_UNFRAG (cr=83 pr=83 pw=83 time=1460 us cost=84 size=190020 card=19002)
19001    INDEX RANGE SCAN TEST_UNFRAG_IDX (cr=42 pr=42 pw=42 time=497 us cost=42 size=0 card=19002)(object id 67988)

Why is this? well the answer lies in the clustering factor because of the way the data was loaded the clustering factor for the first table =400,000 and for the second = 880 (the second index is slightly smaller as well)  this has nothing to do with the placement of the index blocks on disk and everything to do with the slightly bizarre way the data was loaded,  rebuilding the index doesn’t change the clustering factor so won’t alter the execution plan unless we manually manipulate the statistics or use hints to force a range scan.

So do we have a similar type of fragmentation to the SQL server definition of External, well we appear too however the impact is not particularly large also in this case we could avoid it by creating the index with a pctfree of 50 and the organisation of the table data is actually the larger factor here we can’t fix the clustering factor by rebuilding the index only by changing the way we load the data.