487 * Writes are handled in three different ways:
489 * WR_INDIRECT:
490 * In this mode, if we need to commit the write later, then the block
491 * is immediately written into the file system (using dmu_sync),
492 * and a pointer to the block is put into the log record.
493 * When the txg commits the block is linked in.
494 * This saves additionally writing the data into the log record.
495 * There are a few requirements for this to occur:
496 * - write is greater than zfs_immediate_write_sz
497 * - not using slogs (as slogs are assumed to always be faster
498 * than writing into the main pool)
499 * - the write occupies only one block
500 * WR_COPIED:
501 * If we know we'll immediately be committing the
502 * transaction (FSYNC or FDSYNC), the we allocate a larger
503 * log record here for the data and copy the data in.
504 * WR_NEED_COPY:
505 * Otherwise we don't allocate a buffer, and *if* we need to
506 * flush the write later then a buffer is allocated and
507 * we retrieve the data using the dmu.
To get there from here, we could take a look at the size distribution of the ZIL writes. I've implemented this in an updated version of zilstat as follows:
Itty-bitty writes, those less than 4 kBytes. These might suggest a workload which is updating a lot of small files synchronously or perhaps a lot of metadata writes (though not all metadata writes may fit in small spaces).
Medium-sized writes, 4-32 kBytes. It might be more difficult to pin these down, so they get a bin.
Larger writes, > 32 kBytes. If you do not have a separate log, then these will be written to the pool and not the ZIL.
Using these bins will allow better observability into the work to moved around by adding a separate log. Here is what a sample output looks like:
# ./zilstat.ksh -t 60