Raghu On TechPro tips on managing data at scale
DB2 How To Measure Transaction Log Write Time ?
In this short blog post I am going to cover transaction logging, its importance and how to measure logging performance. Transaction logging is needed by RDBMS to ensure Durability, the last alphabet in (ACID) principles.
If you are a DBA, you must know about ACID (Atomicity, Consistency, Isolation and Durability) principles and ARIES (Algorithms for Recovery and Isolation Exploiting Semantics) algorithms.
IBM fellow Dr. C. Mohan is the primary inventor of ARIES algorithms and these algorithms are implemented by most of the RDBMS to make transactions durable.
If you need to understand how ARIES algorithms work, take a look at below overview from Dr. Dittrich.
What about DB2 ?
DB2 infact uses ARIES algorithms to implement durability of this robust RDBMS. Some of the concepts that you might be aware of with out knowing they are part of the ARIES algorithms are “Write-ahead Logging”, “Forward phase of rollforward recovery”, “Backward phase of rollforward recovery”.
For the sake of simplicity, lets only talk about “Write-ahead Logging” (WAL) in this blog post. WAL simply means before any committed transactions are externalized to the database disk (tables/tablespaces), they will be written to transaction log files ?
You may be asking, if RDBMS are going to write every transaction to the log file before committing the changes to the disk wouldn’t it slow down the transactions and throughput ? Short answer is yes, long answer is not as much as you would think.
Soon you will see, how to measure how much time DB2 is taking to perform a log write. In DB2 all the agents write their changes to LOG BUFFER. This log buffer is used by all the applications and could be a point of contention if its not tuned properly. Log buffer gets flushed to disk when ever it gets full or an application issues a COMMIT.
Why Transaction Log Writes Are Faster ?
Simple answer to this question is sequential writes. That means all the writes to the log files are in the form a of journal. These transaction records are appended to the log file in the order they arrive i.e. they are not grouped together at a table/index/tablespace level.
Where as writes to the database has a lot of processing over head, not all writes are sequential for e.g. indexes are always sorted and to insert a key into a B-Tree index, it requires a random write and it involves disk seek + write. Where as with sequential writes there are no intermittent seeks, one seek and keep writing…
In the olden days a single random IO operation of 10 ms is very common, with the latest advancements in storage and SAN the random IO operations are significantly faster, but still they are much slower than their sequential counter parts.
You could also put it this way, RDBMS are mostly designed to get the data out faster i.e. by implementing B-Tree indexes (Which are very slow to write to but fast to fetch from). Where as log structures operate at high write speeds, don’t even try to read meaningful data from them.
Newer NoSQL databases such as Cassandra and text search solutions such as SOLR all use log based LSM indexes. They are mostly designed for high write speeds, since enterprises such as Facebook and Netflix need to put data in at much higher speeds than relational databases could handle.
If you would like to learn more about LSM tree indexes which are core for NoSQL technologies take a look at below link.
How to measure the Log write time ?
After all the boring stuff, finally I am going to share with you a simple SQL to measure the log write time per log write IO. These days almost anything that you need is available to you via MON_GET table functions.
In a non-pureScale/non-DPF environments you will only have one member and your output looks like below.
select member, cast(cast((LOG_WRITE_TIME) as float)/cast((NUM_LOG_WRITE_IO) as float) as decimal(7,5)) as avg_log_disk_write_time from table(sysproc.mon_get_transaction_log(-2)) as log
In a pureScale environment your output will look something like this.
As you can see from the above output, each log write is taking less than 1 millisecond barring member 2 on my pureScale cluster. Good number to shoot for log write time is < 3 milliseconds. Any thing over 3 milliseconds is a cause of concern.
I recently implemented a B-Tree index using java and I am hoping to post a blog on internal workings of a B-Tree data structure as well as read and write performance of a B-Tree.