| Title: | DEC Rdb against the World |
| Moderator: | HERON::GODFRIND |
| Created: | Fri Jun 12 1987 |
| Last Modified: | Thu Feb 23 1995 |
| Last Successful Update: | Fri Jun 06 1997 |
| Number of topics: | 1348 |
| Total number of notes: | 5438 |
From: NOVA::FARRELL "I'M ON NOVA NOW" 15-MAR-1991 15:37:29.42 To: ROZWAT,@BPM_BIG,NM%POBOX::MUTH,NM%TPSYS::POWELL,NM%TPS::KOHLER,NM%COOKIE::RAAB,NM%COOKIE::REICH,HORN,HAGAN,COUGHLAN,NM%CSCOAC::GYLLSTROM_H CC: Subj: positioning of Rdb vs. ORACLE in regards to the ORACLE V6.2 announcemnt Positioning of Rdb vs. ORACLE in regards to the ORACLE V6.2 announcement ------------------------------------------------------------------------- On March 15, ORACLE Corp. announced ORACLE V6.2, a version which will run on Digital VAXclusters, and the intention to run it on Sequent, Pyramid and several parallel vendors' systems in the future. The announcement included performance results of ORACLE V6.2 running the TPC-B(tm) benchmark on a VAXcluster configuration. ORACLE calls this new point release, the "Parallel Server," and will try to use it to sell on nCUBE, Sequent and Pyramid systems, but it is clear that there is nothing "parallel" about the release. Its obvious purpose is to fix ORACLE'S long-standing lack of support for the VAXcluster, and is their attempt to bolster sagging sales in the VAX environment, where ORACLE has been steadily losing market share to Rdb for 3 years. ORACLE'S material to the press and consultants, and therefore to our customers, makes some bold statements, as well as some misstatements about Rdb. This note from Digital's database marketing group should help you to properly position Rdb vs. ORACLE, in light of this recent release. The announcement is notable in that it demonstrates the market importance of the VAXcluster configuration. However, ORACLE touts this capability as a database breakthrough. While it is a breakthrough for ORACLE, it is hardly a breakthrough in database technology. Both Rdb and Ingres have provided full VAXcluster support for several years; Rdb since Version 2.0, introduced in 1985. In the press release, ORACLE states that "their benchmarks demonstrate that with this new technology, customers can now add computers into their environments and get incremental performance gains without changing existing applications." VAXcluster customers using Rdb or Ingres have been doing this for several years. It is ONLY a new capability for ORACLE customers. While ORACLE was busy catching up on VAXcluster support, Rdb recently shipped V4.0 with two-phase commit (2PC), Level C2 security, and a patented dynamic query optimizer which processes complex queries several times faster than the previous version. These, as well as many other Rdb capabilities are not available from ORACLE. In the press release, ORACLE also publishes TPC-B(tm) results of 425.7 transactions per second (tpsB) on a 24-processor VAX6000-Model 500 VAXcluster system. In January, Digital released TPC-B(tm) results for Rdb of 300.1 tpsB on a 16-processor VAX6000-Model 500 VAXcluster system. Rdb's 18.8 tpsB per processor and ORACLE'S 17.7 tpsB per processor indicate comparable scalability. What is not comparable is the level of read consistency provided. As indicated in Oracle's full disclosure report, the serializable parameter is set to false. This means that the RDBMS will restrict itself to the minimum amount of row level locking necessary to ensure write consistency. It will not, however, provide repeatable read or phantom record prevention. Read consistency was sacrificed to gain performance improvement. It should be noted that the Rdb performance reported was achieved while providing repeatable read and phantom record prevention. In the announcement, ORACLE implies that their TPC-B(tm) results are an indication of their product's strength as an OLTP database. In fact, the TPC-B(tm) benchmark does not measure OLTP performance. According to the TPC Benchmark(tm) B Standard Specification from the TPC, "This benchmark is not OLTP in that it does not require any terminals, networking, or think time." It is the TPC-A(tm) benchmark that measures performance of a full OLTP system. TPC-B(tm) is designed to measure database performance only. ORACLE has criticized Digital's implementation of our TPC-B(tm) benchmark test because we partitioned the database. The Digital test was done with the intent to demonstrate the good performance of our recently released two-phase commit capability (which is still lacking in the ORACLE product). Given that only 15% of the transactions were to remote nodes, as specified by the TPC Benchmark(tm) B Standard Specification, partitioning the database was a reasonable implementation, and provides a higher level of data availability than a centralized configuration would. Had the remote requests been 25% or greater, a centralized database would have been more appropriate. Rdb offers both options. ORACLE does not allow use of a partitioned database, due to lack of 2PC. There are several references, in ORACLE'S presentation and press kit, to competitive products. They have listed the following capabilities as being unavailable with Rdb: o Row-level Locking o Group Commit o Online Backup In fact, ALL of these capabilities are available in Rdb and have been available for several versions. "OLTP running on all VAXcluster nodes" was cited as available with Rdb, but "poor." It is not apparent what criteria were used to arrive at that evaluation. Rdb in conjunction with VAX ACMS, Digital's TP monitor, provides an OLTP system that runs well not only in a VAXcluster, where VAX ACMS provides an additional level of security and recoverability, but also across LANs and wide area networks. Digital has many customers who are running production OLTP applications using Rdb and VAX ACMS on a VAXcluster. ORACLE has not run a TPC-A(tm) benchmark test. Thus, there is no proof that ORACLE, which does not have a TP monitor like VAX ACMS, can run OLTP even on a single node system. In ORACLE'S own presentation, they recommend appropriate applications for the parallel server which are very unlike OLTP. For example, applications where "shared data is accessed mainly for read, recently read data is unlikely to be re-used soon." Finally, for those ORACLE customers who were running all their ORACLE applications on a single node of their VAXcluster and who now want to take advantage of the fact that they can load balance those applications across the VAXcluster, it is not quite true that "current ORACLE VAXcluster customers will receive this technology at no additional charge." It will cost them 80% of the cost of the initial ORACLE license for EACH of the other nodes. Moreover, many ORACLE VAXcluster customers have been forced to stay on ORACLE V5.1 to get the VAXcluster support unavailble with V6.0. The upgrade will cost those customers the full license price for each node. Distribution:
| T.R | Title | User | Personal Name | Date | Lines |
|---|---|---|---|---|---|
| 895.1 | eatin' their words! | MBALDY::LANGSTON | assimpleaspossiblebutnotsimplr | Wed Mar 20 1991 22:35 | 62 |
In an article on page 1 of the March 18, 1991 Digital Review, continued on page 6, they eat their words: "DEC also takes issue with other claims about Rdb that Oracle officials made last week, according to [Vicki] Farrell. 'Oracle made a big deal about Rdb not having row-level locking, and we do offer row-level locking. Oracle said that we don't have group commits and we do, and we have on-line backup, although they said we don't,' Farrell said. When questioned about DEC's assertions, Oracle officials modified their stance. 'Yes, Rdb does row-level locking: We may[!] have made a mistake on that point. But they also have limits on the number of locks that can be supported,' said Ken Jacobs, Oracle's director of database marketing. '...We had no intent to misrepresent [what Rdb can do],' Jacobs said. [DEC] also [does] support on-line backup, but it incurs a substantial amount of overhead,' an Oracle spokesman said." The "[!]" in the above is mine. Does anybody have the text of Oracle's announcement? Also, the article was accompanied by a pie chart from Computer Intelligence that has no date. Does anybody know the date of this graphic? It has the following percentages: "DBMS MARKET SHARE AT VAX SITES" DEC* 38% Oracle 22% Other 17% Ingres 15% IBI 5% Compushare 3% * Includes both Rdb and DBMS. Does not include run-time Rdb. The Computer Intelligence numbers for June 1990, according to Michael Booth's Sybase Competitive Fact Sheet, were Rdb 18.4% Oracle 17.7% Other 21.6% Ingres 12.6% In-House 6.5% Focus 3.8% System 1032 2.7% Cincom 1.2% Adabas 1% Sybase 0.6% Note that CI did not include a separate entry for Sybase in the most recent numbers and that they did not break out Rdb and DBMS, separately. Anybody know how Rdb and DBMS break out? I would guess that the DBMS numbers have not changed, gone down a little maybe, so Rdb is probably 24%! That's a larger percentage than I would have guessed for Rdb and Oracle. Bruce | |||||
| 895.2 | Rdb and group commit | TAV02::ARIE_L | Arie Levy | Sun Mar 24 1991 12:16 | 7 |
>......... Oracle said that we don't have group commits and we do, .... What exactly Rdb does, to justify this claim. Thanks, Arie Levy | |||||
| 895.3 | NOVA::FEENAN | Jay Feenan, Rdb/VMS Engineering | Mon Mar 25 1991 02:21 | 7 | |
Rdb has had a group commit capability since version 2.3 of the product.
If you want to go back and find a copy of the release notes, there
should be something in there. Are you asking this, or how it works
or???
-Jay
| |||||
| 895.4 | How it works ? | TAV02::ARIE_L | Arie Levy | Mon Mar 25 1991 07:09 | 7 |
Jay I ment to ask how it works, and where is it described in the Documentation. Thanks, Arie Levy | |||||
| 895.5 | Hopefully some insight... | NOVA::FEENAN | Jay Feenan, Rdb/VMS Engineering | Mon Mar 25 1991 17:05 | 354 |
Below is some mail that I wrote up to answer some questions about
Rdb/VMS, I believe there is a section on group commit (or group write
in there). I don't believe that the actual mechanism is in the normal
doc set, unless there is a one liner saying something like "Rdb/VMS
uses an efficient mechanism to group AIJ writes together..." It is
documented in the "Rdb/VMS Technical Handbook" which is an orderable
item, you'll have to look thru the rdb_40 notes file for the order
number. Hopefully this helps a bit.
-Jay
SYNCHRONOUS WRITE
- Database (i.e. data records as compared to log
records) I/O is done synchronously (at request
time as compared to commit time) with writes by
the application or user.
o At the I/O level we use both synchronous and
asynchronous writes for data, which I believe
may have been a confusing point during the course
of the conversations that you have had with DEC
people. During a write operation you will update
the tuple on the database page and we will defer
the write until one of three events:
1. Another user wants to update another tuple on
the database page. In this case the user is
signaled that someone wants the database page
and writes that page back to the disk....this
write is a synchronous write (before this
write can happen the one record updated is
journaled in the RUJ file...but that will be
discussed later).
2. Commit Time. - In the current release, all
database pages that have been updated will
be written to disk. The sequence of writes to
the journal will be discussed later, but the
write of updated database pages is done as a
synchronous operation, BUT using asynchronous
I/O's. How this is done is that we mark the
start of the buffer flush, issue all the
I/O's asynchronously and then continue in
the commit sequence when all data is written.
Thus although the operation is synchronous the
I/O is done asynchronously for each database
buffer (applications use say 100+ each). And
the total time of the operation is decreased
significantly. Tests have shown a graphed
result of time vs I/Os to level off so that
Page 2
the total time of say a complete flush of a
buffer pool of three buffers is approx. the
same as ten buffers.
3. Buffer Pool Overflow - a group of pages need
to be read into a buffer and there are no
empty buffers, thus a buffer has to be se-
lected for a flush to make a buffer available.
We have an internal method of partitioning
our buffer usage to select buffers contain-
ing database pages that have been updated vs
only read. Thus we can defer the flushing of
updated pages, which defers the corresponding
journal writes.
ASYNCHRONOUS WRITE
- Data records are written when it is convenient for
the database engine, possible at commit time.
o Explained above
DEFERRED WRITE
- Database (ie. data records) are written to disk
asynchronously from application processing. This
means that data records need not be written to the
database at commit time and implies that a "log
I/O" protocol is followed. During recovery this
means that uncommitted transactions need only be
removed from the recovery log and that committed
transactions which have not been applied to the
database are then applied.
o I wouldn't have used this terminology, so maybe
this is where some confusion arose. I would de-
scribe the above scenario as an undo/redo recov-
ery scheme with some type of checkpoint interval.
The checkpoint interval would define the interval
of the write operation to synchronize the both
Page 3
the undo/redo log operations and the committed
data. In the current release of Rdb/VMS in the
field we do not have this recovery scenario.
If I used the term deferred write I would be
referring to our the cache scheme of deferring
the write to the commit time, using the I/O
scheme described above.
GROUP COMMIT
- Log I/O operations are minimized by grouping to-
gether log data of multiple committed transactions
and writing the data to the recovery log in a single
I/O operation. The number of commits handled in this
way is usually dependent on the number of commits
ready for processing within some small time window.
This circumvents the need for sequential processing
for the individual commits, but does not signifi-
cantly delay commit processing if multiple commits
are not ready.
o This has been implemented in Rdb/VMS since V2.3
and has been improved over the years. But yes,
this is implemented as you describe the only dif-
ference is that we commonly call this our group
write capability VS the group commit capability.
The reason is that the grouping of writes to the
after image log is always done. The COMMIT point
is just a special case. The two cases are 1. that
a user writes a number of log records (as he up-
dates data records) to an after image log buffer,
when the buffer becomes full (dependent on the
size of the data records and number being up-
dated) the buffer then needs to be flushed. (this
is a rare event) 2. The commit point is reached
and the after image log record buffer is flushed.
At either point the group write mechanism is
used.
Page 4
SYNCH POINT
- A synch point is specific to a particular users
transaction space. All database and log buffers are
flushed to their appropriate designation.
o I would use this description as our commit point
in terms of recovery log behavior. However, also
I would equate a synch point to the start of a
transaction. This is the sets up the synchroniza-
tion of a database user in terms of the database
environment. The most important part of this is
the assignment of our internal TSN number. This
is used in terms of our recovery mechanisms for
synchronizing transactions (applying after im-
age logs to a database restored from a backup),
automatic space reclamation and our snapshot
versioning mechanism. Over the past few years
we have optimized this mechanism in a number of
different ways.
The first method is what we call "pre-start" of
a transaction. This happens at the commit point
of a writer within the database. When the I/O
is done to the root file to "mark" the user as
committed we optimistically believe that his
next operation will be the start of another write
operation. Thus at that time we 'pre-allocate'
the next TSN number for the user. So at the start
of the transaction the synchronous I/O that would
be needed to the root file (the coordinating
'accounting file' in our database environment) is
saved.
The last point I'd like to make is that Rdb/VMS
has an 'optimistic commit strategy'. What this
means is that traditionally the I/O to the root
file would be defined as the 'commit point' or
synch point in this context. 'Commit point' being
defined as the final point in the transaction
Page 5
where a 'rollback' would not happen. Anyways,
the optimistic commit strategy is that the AIJ
record is written to the log, before the synch
point (i/o to the root). If there is a crash
before the root I/O recovery would scan the aij
backwards through the AIJ looking for the commit
record for that transaction. If it finds it it
will commit the transaction, if not it will roll
it back. This way the total time of the I/O's at
the commit operations are stream lined in wait
time.
All of this (write to the root) is done with our
'group commit' optimization (vs. the group write
to the AIJ file described above...but using the
same concept).
SYSTEM CHECKPOINT
At a system checkpoint (possibly triggered on an inter-
val, when
- buffers are full, after N commits, at operating
system or TP monitor request, etc.), all database
and log buffers are flushed to their appropriate
destination. In the current release of Rdb/VMS
in the field we do not have this checkpointing
mechanism
QUESTIONS:
1. Which of the following does Rdb/VMS support?
* Described above.
2. What processing takes place at commit time?
* At commit time the order of writing is to the
ruj file (before images of data changes), the
Page 6
data area files (data files), the aij file (af-
ter images of changes) and the root file (synch
point). The RUJ file write is synchronous, the
data write is a synchronous operation that uses
asynchronous I/O as described above, the AIJ
I/O is synchronous using the group write capa-
bility described above and the root file I/O is
synchronous using the group commit capability
described above. The start of a new transaction
does not require an I/O, because of the pre-start
transaction capability as described above. What
this translates to is that the journal and root
file I/O is a "fractional I/O" during a users in-
dividual transaction...our measured rate is down
to .1 (10 users doing 1 TPS will have one I/O)
3. What processing takes place at recovery time?
* At recover time. There are two type of trans-
action recovery. One is that a user types in
ROLLBACK (or in VMS he exits his image 'nor-
mally'...which will invoke image exit handlers
that call ROLLBACK for the user). In this sit-
uation the user will re-apply before images of
updates (from the RUJ file) to database pages
that were updated, write a rollback record to
the AIJ file and terminate his transaction in
the root file. The other situation is an abnormal
termination (a node leaving the cluster or a user
that opens the database for the first time after
a complete system crash is just a special case of
this processing). The situation is detected by
the monitor process (really this is the only ma-
jor job this process plays in the system), a DBR
(database recovery) process is created, this pro-
cess inherits the context of the terminated user
from the root file and rolls back the transaction
as described above, if any database pages that
were written to have been flushed to disk. The
Page 7
important item not note is that we do not need
any 'corresponding log records' for a transaction
to rollback which I know other system do...this
results in a reduced number of log records and
faster system and media recovery scenarios.
4. How can a DBA control the amount of time it takes to
recover a database after a system crash?
* This can not be currently controlled. We view
this as really an option that goes hand in hand
with what you refer to as system synch points.
And will be implemented when the need arises.
5. What processing takes place when a database if
rolled forward?
* First the database is restored from a backup.
The user then uses the RMU/RECOVER command which
opens the AIJ file and sequentially reads the
file. It will buffer log records for a transac-
tion, if the transaction is rolled back it will
discard the records buffered. If the transaction
is committed it would be applied to the database.
Rdb/VMS has a recover/rollforward capability on
a per area basis. Say, one area is corrupt. The
area is deleted, that one area is restored from
a backup file (you can backup only one area if
you want or the restore facility will selectively
get one area out of a complete backup file).
Then apply the AIJ file to that one area...this
first gathers context out of the root file to
determine the last committed transaction within
the database environment and then applies updates
to the area being restored until that commit
point is reached.
Page 8
6. Is the AIJ optional? If so, what are the penalties
for not using it (ie, what can't I do)? Yes it
is totally optional. There are no penalties in the
current recovery schemes if disabled. A undo/redo
scheme, that is mentioned above would find this log
mandatory to use. Oh, there is one restriction, you
couldn't rollforward the database!
| |||||
| 895.6 | See QUORUM of February 1990 | TAV02::ROTENBERG | Haim ROTENBERG - Israel Soft. Support | Tue Mar 26 1991 13:34 | 8 |
It is always a pleasure to speak with you through the net. There is an
article called "High Availability Mechanisms of VAX DBMS Software"
which was published in the VAXcluster systems Quorum in February 1990
on page 76, section Group AIJ Flush. Amir also gave a detailed
presentation on the subject on the last VIA Forum meeting we hold last
month. If you need more detail, please call me.
Haim
| |||||
| 895.7 | Sensitive material? | DENVER::DAVISGB | Thunder 'n Litnin.... | Mon Apr 01 1991 17:35 | 4 |
For those of us who have been away for awhile...why is .0 set hidden?
Gil
| |||||
| 895.8 | .0 now unhidden - here's the history | BROKE::ASHELL::WATSON | work hard and be nice | Thu Apr 04 1991 21:14 | 18 |
> For those of us who have been away for awhile...why is .0 set hidden?
.0 is no longer set hidden. As you can now see, it is a very
comprehensive response from Vickie Farrell (Database Systems Marketing
Manager) to Oracle's recent announcement.
It is appropriate that Vickie's response be posted in this notes
conference; the field needs this information. However, experience shows
that posting such information openly on the network gets it into the
hands of Oracle very quickly; it was set hidden to prevent Oracle being
able to concoct a response to the response while it was still hot off
the presses.
I suggested to Vickie that we've now reached the stage at which her
response has been used often by the field and the press that we can,
and should, unhide it so that all can make use of it. She agreed.
Andrew.
| |||||
| 895.9 | .0 Still Hidden | DPDMAI::HYDE | Rdb �ber alles OKO 487-2256 | Fri Apr 05 1991 01:34 | 4 |
.0 is still hidden from me.
Kurt
| |||||