Before I dive into the microbenchmark, a few words about the ZFS Intent Log (ZIL). ZFS is a transactional file system, which means that it collects I/O into a transaction group (txg) and commits that txg to persistent storage. In later ZFS implementations, that txg commit occurs every 30 seconds. However, if an application needs to ensure that an I/O is written to persistent storage immediately, often called synchronous writes (though that is arguably not the best descriptive term), then waiting for up to 30 seconds is not an option. This is where the ZIL enters the picture. In the synchronous write case, ZFS will write the record to the ZIL and later commit the record with the txg. This ensures the synchronous write agreement between the application and ZFS is honored -- a good thing. Neil Perrin offers a more detailed description in his famous lumberjack blog posting.
Synchronous writes are the bane of high performance. Really. We see this every day. It causes performance guys to gnash their teeth and cuss. When a microbenchmark wanders along and does a lot of synchronous writes, complaints about how "
sucks" and "I can't believe those file system developers could be so insensitive" come pouring forth.
To determine the performance envelope of a benchmark, it is relatively easy to disable the ZIL. This is neither a safe nor recommended option for production systems or people who like their data. But for benchmarking, it allows a performance engineer to quickly determine the best possible performance for the given system configuration. The ZIL is then re-enabled and the work can concentrate on how to approach that performance goal. Tools like zilstat are designed to help with this endeavor, and can save you a lot of time when you suspect synchronous write performance might be an issue.
But disabling the ZIL can also hide important behavior. That is why this microbenchmark could be a poster child for benchmarking that doesn't do what you expect. Here it is:
while true; do
echo "blah" > outputfile
When run on an Solaris NFS client with a Solaris NFS server using default NFS settings, this will cause the following to occur:
- outputfile is LOOKUPed
- outputfile is OPENed
- ACCESS to outputfile is checked
- The data is written to the file with WRITE
- The data is COMMITted
- outputfile is CLOSEd
Argv! This simple microbenchmark actually makes lot of synchronous writes to the file system. zilstat will happily show that the ZIL is working hard when running this microbenchmark. If you run this, then you can experiment with various pool or separate (ZIL) log configurations to your heart's content.
However, if you disable the ZIL, then the number of I/O operations is reduced to just a handfull, every 30 seconds. Why? Because ZFS is clever enough to recognize that the same file is being overlaid and is only concerned with physically commiting the last one in the transaction group. In other words, the amount of I/O traffic to the pool is dramatically reduced. When this happens, you are no longer measuring the affect of ZIL I/O, you are also measuring the main pool I/O. The results look something like this:
- ZIL enabled, no separate log = 100 iterations/second
- ZIL enabled, separate log on a fast SSD = 1,000 iterations/second
- ZIL disabled = 10,000 iterations/second
The moral of this tale is: beware of microbenchmarks and how they can confuse your understanding of the real system behavior.
P.S. Don't disable the ZIL.
P.P.S. I really mean it, don't disable the ZIL. Seriously. I might cut you some slack for benchmark purposes, but other than that, don't disable the ZIL. Period. End of discussion.