Backups for file systems with millions of files

Recently, there have been a number of discussions about how to backup an active file system with millions of files. This is a challenge because traditional backup tools do a file system walk -- traversing the file system from top to bottom looking at the modification time for each file. This works well for file systems with a modest number of files. For example, one of my OpenSolaris systems has around 62,000 files in the root file system and backups go at media speed. But when you get into the millions of files with deep heirarchy, the time required to manage the file system walk begins to dominate backup time. This can be a bad thing, especially if you backup to tape.

We didn't notice this problem with UFS because, quite simply, UFS can't handle millions of files in a single file system. But with ZFS becoming mainstream, it does not have this limitation and people are taking advantage of the ease of managing large datasets with ZFS.

One successful approach to solving this problem uses ZFS snapshots to backup an active file system to a backup file system. For diversity, the backup file system could be located on another disk, array, or host. For cost efficiency, the backup file system can have different properties than the active file system -- compression is often a good idea. The trick is that the active file system can be optimized for high IOPS and the backup file system can be optimized for low cost per byte.
This might look familiar to you. Many people have performed backups from a replica of the production data. Most of those implementations perform the replication at the block level. ZFS can replicate at the file system level -- allowing policy or configuration changes at the block level, such as different RAID configurations or data compression.

With block-level replicators, such as the Sun StorageTek Availability Suite, the replica has no knowledge of the context of the data. The replica's view is block-for-block identical to the original. With ZFS replicas, the file systems can have different data retention policies. For example, the production site may have a snapshot retention policy of 24 hours and the replica may have a retention policy of 31 days. As long as there is a latest, common snapshot between the production site and replica, any later snapshots can be replicated. Try doing that on a block-level replicator!

There is much more to this story. In particular, retention policies and performance optimization can get very complex. I'm working on a good example which goes into more detail and discusses performance concerns... more on that later...