Linux Journaling vs. Copy-On-Write Filesystems: A Comparison

Hope you have read

This comparison table is taken from the book “Architecture and Design of the Linux Storage Stack” which I find useful to help understand the differences between the two.

JournalingCopy-On-Write
Write handlingChanges are recorded in a journal before applying them to the actual file systemA separate copy of data is created to make modifications
Original dataOriginal data gets overwrittenOriginal data remains intact
Data ConsistencyEnsures consistency by recording metadata changes and replaying them if neededEnsures consistency by never modifying the original data
PerformanceMinimal overhead depending on the type of journaling modeSome performance gains because of faster writes
Space utilisationJournal size is typically in MB, so no additional space is requiredMore space is required due to separate copies of data
Recovery timesFast recovery times as the journal can be replaced instantlySlower recovery times as data needs to be reconstructed using recent copies
FeaturesNo built-in support for features such as compression or deduplicationBuilt-in support for compression and deduplication
Taken from “Architecture and Design of the Linux Storage Stack”

Copy-on-write Filesystem in Linux in a Nutshell

The Working of a Copy-on-Write System In Brief

Copy-On-Write Filesystem does not overwrite the data in place, here is how it is done. Supposedly there is file that will be modified.

  1. Copy the old data to an allocated location on the disk
  2. New data is written to the allocated location on the disk.
  3. Hence the name Copy-and-Write
  4. The references for the new data are updated
  5. However, the old data and its snapshots are still there

As described in the Architecture and Design of Linux Storage Stack by Muhammad Umer Page 59

As the old data is preserved in the process, filesystem recovery is very simplified. Since the previous state of the data is saved on another allocated location on disk. If there is an outrage, the system system can easily revert to its former state. This make the maintenance of any Journal obsolete. This also allows snapshots to be implemented at the filesystem level.

As the old data is still there, space utilisation may be more than what the user expects……

Some of the filesystem the use the CoW based approach includes Zttabyte Filesystem (ZFS) and B-Tree Filesystem (Btrfs)

Journaling File System: Advantages, Working, and Impact on Performance

Definition

According to a nice explanation by minitools.com “What Is Journaling File System and Its Advantages/Disadvantages

The journaling file system (JFS) is a kind of file system developed by IBM IN 1990. It keeps track of changes, which are not yet committed to the file system’s main part, by recording the goal of such changes in a data structure known as “journal”. Usually, the “journal” is a circular log.

In the event of a system crash or power failure, a journaling file system can be brought back online more quickly with a lower chance of being corrupted. Depending on the actual implementation, the JFS may only keep track of stored metadata, which results in improved performance at the expense of increased possibility for data corruption.

What Is Journaling File System and Its Advantages/Disadvantages

The Working of a Journal File System In Brief

Here is a diagram taken from Architecture and Design of Linux Storage Stack by Muhammad Umer Page 57

According to the Chapter 3 of the book,

From the diagram, any changes made to the filesystem are written sequentially to a journal, also called a transaction. Once a transaction is written to a journal, it is written to an appropriate location on a disk. In the case of a system crash, the filesystem replays the journal to see whether any transaction is incomplete. When the transaction has been written to its on-disk location, it is removed from the Journal.

It is interesting to note that either the metadata or actual data is first written to the data. Either way, once written to the filesystem, the transaction is removed from the journal. The size of the journal can be a few megabytes.

Benefits of Journal File System and Impact on Performance

Besides making the Filesystem more reliable and preserving its structure in system crashes and hardware failures, the burning question is whether it will impact performance?

Generally, journaling improves performance when it is enabled by having fewer seeks to the physical disks as data is only when a journal is committed or when the journal fills up. For example, in intense meta-data operations like recursive operations on the directory and its content, journaling improves performance by reducing frequent trips to disks and performing multiple updates as a single unit of work.

What is FileSystem in USErspace (FUSE)?

The Linux kernel supports many filesystems that are native to Linux, but there are other filesystem that Linux support via FileSystem in USErspace (FUSE). Probably you would guess one prominent example of a Filesystem outside Linux that Linux user face often – Windows NTFS

I have written an entire blog on how to get your portable drive on NTFS working. For more information, do take a look at Mounting NTFS on Rocky Linux 8

Another popular usage of FUSE driver is sshfs. It is using SSH to mount the remote file system and avoid the need to setup up NFS or SAMBA while enjoying the benefits of SSH encryption.

There is a good article found at RedHat.com on how to mount a remote file system over SSH

Mounting NTFS on Rocky Linux 8

If you are planning to mount like a portable drive using Windows NTFS File System on the Rocky Linux 8, what you will see immediately when you issue the command after you plug the portable drive in

# mount /dev/sdd1 /data1
mount: /data1: unknown filesystem type 'ntfs'.

Step 1: Enable EPEL Repo

# dnf install epel-release

Step 2: Install NTFS-3g

# dnf install ntfs-3g

In some blogs written elsewhere, these 2 packages are more than enough, but I was still having issues. In my situation, I need to put in 5 packages

Step 3: Install all NTFS-3g packages

# dnf install *ntfs*

This time it works for me.

Step 4: Simply mount (Hooray!)

 # mount /dev/sdd1 /data1

What is TMPFS?

TMPFS is a filesystem that exists only in memory. When you reboot the Server, the content of the TMPFS is gone. This is perfect for mounting the /tmp directory

In your /etc/fstab

tmpfs /tmp tmpfs size=30% 0 0

The size option set the maximum size of the filesystem. In the setting above, it can take up to 30% of the total RAM. If you umount the file system, all memory is returned.

Protecting Red Hat OpenShift Containerized Environment with IBM Spectrum Protect Plus

Introduction

This IBM® Redpaper publication describes support for Red Hat OpenShift Container Platform application data protection with IBM Spectrum® Protect Plus. It explains backup and restore operations for persistent volume data by using the Container Storage Interface (CSI) plug-in.

Table of Contents

Chapter 1. Introducing containers
Chapter 2. IBM Spectrum Protect Plus architecture
Chapter 3. Installing IBM Spectrum Protect Plus as a containerized application
Chapter 4. Container Backup Support
Chapter 5. Implementing Container Backup Support
Chapter 6. Using Container Backup Support
Chapter 7. Red Hat OpenShift cluster disaster recovery solution

More Information at IBM Spectrum Protect Plus: Protecting Red Hat OpenShift Containerized Environments

IBM Spectrum Scale Container Native Storage Access (CNSA)

IBM Spectrum Scale Container Native Storage Access (CNSA) allows the deployment of Spectrum Scale in a Red Hat OpenShift cluster. Using a remote mount attached file system, CNSA provides a persistent data store to be accessed by the applications via the IBM Spectrum Scale Container Storage Interface (CSI) driver using Persistent Volumes (PVs).