Linux Journaling vs. Copy-On-Write Filesystems: A Comparison

Hope you have read

This comparison table is taken from the book “Architecture and Design of the Linux Storage Stack” which I find useful to help understand the differences between the two.

JournalingCopy-On-Write
Write handlingChanges are recorded in a journal before applying them to the actual file systemA separate copy of data is created to make modifications
Original dataOriginal data gets overwrittenOriginal data remains intact
Data ConsistencyEnsures consistency by recording metadata changes and replaying them if neededEnsures consistency by never modifying the original data
PerformanceMinimal overhead depending on the type of journaling modeSome performance gains because of faster writes
Space utilisationJournal size is typically in MB, so no additional space is requiredMore space is required due to separate copies of data
Recovery timesFast recovery times as the journal can be replaced instantlySlower recovery times as data needs to be reconstructed using recent copies
FeaturesNo built-in support for features such as compression or deduplicationBuilt-in support for compression and deduplication
Taken from “Architecture and Design of the Linux Storage Stack”

Copy-on-write Filesystem in Linux in a Nutshell

The Working of a Copy-on-Write System In Brief

Copy-On-Write Filesystem does not overwrite the data in place, here is how it is done. Supposedly there is file that will be modified.

  1. Copy the old data to an allocated location on the disk
  2. New data is written to the allocated location on the disk.
  3. Hence the name Copy-and-Write
  4. The references for the new data are updated
  5. However, the old data and its snapshots are still there

As described in the Architecture and Design of Linux Storage Stack by Muhammad Umer Page 59

As the old data is preserved in the process, filesystem recovery is very simplified. Since the previous state of the data is saved on another allocated location on disk. If there is an outrage, the system system can easily revert to its former state. This make the maintenance of any Journal obsolete. This also allows snapshots to be implemented at the filesystem level.

As the old data is still there, space utilisation may be more than what the user expects……

Some of the filesystem the use the CoW based approach includes Zttabyte Filesystem (ZFS) and B-Tree Filesystem (Btrfs)

Journaling File System: Advantages, Working, and Impact on Performance

Definition

According to a nice explanation by minitools.com “What Is Journaling File System and Its Advantages/Disadvantages

The journaling file system (JFS) is a kind of file system developed by IBM IN 1990. It keeps track of changes, which are not yet committed to the file system’s main part, by recording the goal of such changes in a data structure known as “journal”. Usually, the “journal” is a circular log.

In the event of a system crash or power failure, a journaling file system can be brought back online more quickly with a lower chance of being corrupted. Depending on the actual implementation, the JFS may only keep track of stored metadata, which results in improved performance at the expense of increased possibility for data corruption.

What Is Journaling File System and Its Advantages/Disadvantages

The Working of a Journal File System In Brief

Here is a diagram taken from Architecture and Design of Linux Storage Stack by Muhammad Umer Page 57

According to the Chapter 3 of the book,

From the diagram, any changes made to the filesystem are written sequentially to a journal, also called a transaction. Once a transaction is written to a journal, it is written to an appropriate location on a disk. In the case of a system crash, the filesystem replays the journal to see whether any transaction is incomplete. When the transaction has been written to its on-disk location, it is removed from the Journal.

It is interesting to note that either the metadata or actual data is first written to the data. Either way, once written to the filesystem, the transaction is removed from the journal. The size of the journal can be a few megabytes.

Benefits of Journal File System and Impact on Performance

Besides making the Filesystem more reliable and preserving its structure in system crashes and hardware failures, the burning question is whether it will impact performance?

Generally, journaling improves performance when it is enabled by having fewer seeks to the physical disks as data is only when a journal is committed or when the journal fills up. For example, in intense meta-data operations like recursive operations on the directory and its content, journaling improves performance by reducing frequent trips to disks and performing multiple updates as a single unit of work.