If you’ve decided to use ZFS on your storage devices, congratulations! You’re using one of the most complex and feature-rich filesystems on the planet. And if you ever decide to store long-term data, such as family photos and videos, seriously consider ZFS. In a redundant setup, such as four mirrored hard drives, it absolutely guarantees you will never lose one bit of data due to bit rot or other forms of storage degradation over time, computer errors and so on. ZFS can self heal and recover data automatically. Complex algorithms, hashes and Merkle trees guarantee data integrity.
However, in this tutorial why ZFS is the best choice for archiving long-term data is not covered. Instead, what snapshots and clones can do for you is being discussed.
What Are ZFS Snapshots and Clones?
A snapshot is simply an exact picture of the state of your data at a certain point in time. For example, let’s say you’re working on a complex website. You store all code, databases, and images on your ZFS dataset. You change the design of the website, modify some images, change some layout dimensions and modify some code to make all this fit. If you want to revert to the previous design, you would have to revert all those changes individually. With ZFS, you can simply take a snapshot of your current design, make all the changes you want to make, and if you’re unhappy with the new design, simply roll back to the previous snapshot. And yes, it’s true, there is Git, GitHub and even some code editors that include the ability to take a snapshot and roll back. But with ZFS includes the following features as well:
- Snapshots are global. They create a snapshot of absolutely all data included in your project.
- Snapshots and rollbacks are almost instantaneous, no matter how big your project is (even if it has hundreds of gigabytes).
There’s no limit to the number of snapshots. You can have “Design 1,” “Design 2,” and “Design 3” and switch freely between them, make changes, and create a new snapshot: “Design 2 – Improved.”
While snapshots are basically frozen data states that you can return to, clones are like branches that start from a common point. To understand it better, imagine this scenario: You create a video for an advertising campaign. Then, you take a snapshot of this video (actually of the ZFS dataset where you store your video). Now, you clone this snapshot three times. You give “Clone 1” to one employee, “Clone 2” to another employee and “Clone 3” to the third employee. Now they can each work in their own individual space and make their desired changes.
Why is this useful? Videos can occupy huge amounts of disk space. High-resolution raw film can require hundreds or thousands of GB of storage. If the main video needs 500GB of storage and three people need to clone and work on divergent changes, this would require over 1500GB of storage.
With ZFS, the snapshot and three (or more) clones will require no more than 501GB of storage. Blocks of data that don’t change (all clones have this in common) are only stored once. This way, only the differences that each editor adds are stored as additional data. In a real world scenario, you may need something like 650GB of data for all three clones. It’s an efficient use of storage and resources, and data is properly isolated so that each editor can work to his heart’s content.
Of course, it’s useful for many other scenarios where you need to branch the same content in multiple different directions, even if disk space requirements are not a concern.
Commands Used to Work with ZFS Snapshots
While other Linux distributions can use this filesystem/volume manager, Ubuntu offers the best support, to date, for ZFS.
Since not all users have a whole disk available to offer ZFS, it may be useful to know that you can also create a pool on an empty partition with a command such as
sudo zpool create pool_name /dev/sda3, where
/dev/sda3 is the device name of your third partition on your first disk.
After you install the proper packages and create your first ZFS dataset, this is how you create a snapshot.
First, find out the name of your ZFS dataset that you want to snapshot.
In this example, the name of the dataset is
data and the name of the snapshot will be
snap1. Replace these values in the next command with what applies in your case. To create a snapshot, enter:
If in your case the dataset is named
videos, and you want to call your snapshot
first, the command would be:
To roll back changes and restore your dataset to the exact contents it had when you took the snapshot, use:
When you no longer need a snapshot, delete it with:
Commands Used to Work with ZFS Clones
Assuming you have a snapshot called “data@snap1,” clone it with:
To delete a clone:
And you can also snapshot clones.
In the future, when you want to remember all the snapshots and clones you have created, use:
This covers all basic operations you can do with ZFS snapshots and clones. It may be useful to know that each dataset has a hidden directory within called “.zfs.” With a command like
ls /data/.zfs/snapshot/snap1/, you are able to see the state of files in a snapshot. Since it acts like a regular (read-only) directory, you can also copy individual files from a snapshot in case you don’t need to revert the entire snapshot.