|  |     
    
    Loading costs can vary widely depending on what mechanisms and
    gateways you are using, as well as what is serving as source
    and destination.
    
    For example if you are loading an imported RDB table directly from
    an application that is providing the data, you should expect to see 
    very good performance relative to loading the RDB table directly.
    The pathlengths and overhead for DBI should be very low, since
    DBI is acting like a dispatcher and there is little DBI processing
    involved. If you are loading an updatable horizontal partition view from
    an application there will be slightly more overhead because
    DBI needs to look at the partitioning column and pick the right
    place to do the insert.  Again this overhead is not very big.
    
    Inserts from a SELECT clause could be slower depending on the
    nature of the SELECT clause and the size of the aggregate data
    selected.  Data may need to aggregate in DBI's workspace before
    the insertion part of the work starts to proceed.  The next release
    of DBI has a much better dataflow model (via improved fast-first features)
    which means less aggregation.  INSERT statements using select
    clauses with ORDER BYs or implicit sorting (e.g. GROUP BY) will
    always force DBI or any engine to have to do the SELECT in its
    entirety (and store the interim results in virtual memory or
    an on disk workspace) before the insert work can proceed.       
    
    
    
 | 
|  |     
    Hi Stefan,
    
    Here is some raw data gathered by Frank Lee.  The bottom line though
    is that we don't believe you will see a large deviation from directly
    inserting into an Rdb database.  Note that these tests were run on our
    cluster (which is active) although Frank did multiple runs of each to
    try to get accurate data.
    
	Tables added to the catalog for this analysis:
lineitem0: in TEST$DATABASE:R51_UDB_1.RDB (row size: 201, populated w/ 64k rows;
lineitemr: in TEST$DATABASE:R51_UDB_1.RDB (same table definition as lineitem0)
lineitemra: in TEST$DATABASE:R51_UDB_2.RDB (same table definition as lineitem0)
Test results are as follows.
I. lineitem0 to lineitemr using Rdb only:
SQL> attach 'f TEST$DATABASE:R51_UDB_1.RDB';
--- Test case A:
SQL> insert into lineitemr (select * from lineitem0 where l_orderkey < 10000);
9965 rows inserted
....
--- Test case B:
SQL> insert into lineitemr (select * from lineitem0 where l_orderkey<20000);
20058 rows inserted
....
 Test      Average	   Average	    Average          Average 
 Case     elapsed time     CPU time        Page Faults        DIO
  A         52 sec.         10.5 sec.          57             11908 	    
  B         74              24.7              116             14272
A memory leak of ~100 bytes per record was observed. This is an RDB problem
and a QAR was filed against RDB V7.0.
II. lineitem0 to lineitemr using DBI (DBI_OPTIMIZATION_LEVEL is TOTAL TIME):
$DEF SQL$DATABASE /TYPE=DBI/DBNAME=DOC21:[F_LEE.LOCAL_TEST.CATALOG]CAT_4_2_515
$dbisql
SQL> attach 'f sql$database';
--- Test case A:
SQL> insert into lineitemra (select * from lineitem0 where l_orderkey < 10000);
9965 rows inserted
....
--- Test case B:
SQL> insert into lineitemra (select * from lineitem0 where l_orderkey<20000);
20058 rows inserted
....
 Test      Average         Average          Average          Average   
 Case     elapsed time     CPU time        Page Faults        DIO     
  A          52             15.6               57             11885     
  B          76             25.0              116             14278  
 
	In this mode of operation, the DBI materialized the data collection
for the lineitem0 before inserting into the target table. Each tuple used
about 500 bytes. For case A, the additional virtual pages increased was
8924 pages; for case B, 19920 pages. The memory leak per record is ~100 bytes
(the same as test case I)
    
III. lineitem0 to lineitemra using DBI (DBI_OPTIMIZATION_LEVEL is FAST FIRST):
Similar setup as in II.
 Test      Average         Average          Average          Average
 Case     elapsed time     CPU time        Page Faults        DIO
  A         53              15.7              57              11886
  B         73              24.3             116              14278      
	In this mode of operation, the DBI fetched a tuple from the 
from lineitem0 and inserted the tuple into lineitemra repeatedly, without
buffering the source tuples. No additional vitual memory was used for
the buffering. However, the memory leak per record is about the same (~100
bytes/record).
    
IV. Summary:
For the insert statements tested with TOTAL TIME and FAST FIRST optimization,
   the results in II and III do no show significant difference in turns of 
   performance. With two data points, it is hard to save which is better.
   However, even that the amount of IOs are the same, to buffer large number of
   source tuples will definitely affect the performance for the TOTAL TIME
   optimization. 
   Until we have more performance data, we recommend that users 
   set the DBI optimization level to FAST FIRST for this type of 
   SQL operation to avoid the problem of running out of virtual memory.
   
 |