How to Compress Archives Using All CPU Cores with Tar

Tar All Cores Feature

If you’ve ever had to compress large volumes with tar, you’ll know how much of a pain it can be. It often goes very slowly, and you find yourself hitting Ctrl + C to end the task and just forget about it. However, there are some other tools that tar can use, and they’re a great way to make use of today’s heavily multi-threaded CPUs and speed up your tar archiving. This article shows you how to make tar use all cores when compressing archives in Linux.

Understanding and Installing the Tools

The three main tools in question here are pigz, pbzip2, and pxz. There are some subtle differences between the tools, but the differences lie between gzip, bzip2, and xz. In that respective order, the compression levels increase, meaning that an archive compressed with gzip will be larger than one compressed with xz, but gzip will naturally take less time than xz will. bzip2 is somewhere in the middle.

The “p” that starts the names of each of the tools means “parallel.” Parallelization is something that has become increasingly more relevant over the years – how well something spans all CPU cores. With CPUs like AMD’s Epyc and Threadripper lines that can reach 64 cores and 128 threads, it’s important to understand what applications can make use of that. These compression functions are prime candidates.

To install the tools, you can just turn to your repos.

Tar All Cores Dnf Install

This article focuses on pxz for the sake of consistency. You can check out this tutorial for pigz.

Compressing Archives with Tar

The syntax of tar is fairly simple. To just compress a directory, you can use a command like this:

The first will use gzip, the second will use bzip2, and the third will use xz. The filename and the directory will vary depending on what you’re doing, but I pulled the Linux Kernel from GitHub into my “/home” directory, and I’ll be using that. So, I’ll go ahead and start that command with the time command at the front to see how long it takes. You can also see that xz is listed as taking the highest percentage of my CPU on this system, but it’s only pinning one core at 100 percent.

Tar All Cores Tar Xz
Tar All Cores Htop 1

And, as you can see, it took a very long time for my aging i7-2600s to compress Linux 5.10-rc3 (around 28 minutes).

Tar All Cores Xz Time

This is where these parallel compression tools come in handy. If you’re compressing a large file and looking to get it done faster, I can’t recommend these tools enough.

Using Parallel Compression Tools with Tar

You can either tell tar to use a compression program with the --use-compression-program option, or you can use a little bit simpler command flag of -I. An example of the syntax for any of these tools would be like this:

Let’s test it out and see how long it takes my system to compress the Linux Kernel with access to all eight threads of my CPU. You can see my htop readout showing all threads pinned at 100 percent usage because of pxz.

Tar All Cores Tar Pxz
Tar All Cores Htop 2

You can see that it took substantially less time to compress that archive (about seven minutes!), and that was with multitasking. I have a virtual machine running in the background, and I’m doing some web browsing at the moment. The Linux Kernel hardware scheduler will give you what you need for your personal stuff, so if you left your pxz command to run without any other stuff running on your system, you may be able to get it done faster.

Tar All Cores Pxz Time

Adjusting Compression Levels with pigz, pbzip2, and pxz

You can also pass compression levels to pxz to make the file even smaller. This will require more RAM, CPU, and time, but it’s worth it if you really need to get a small file. Here’s a comparison of the two commands and their results side by side.

Tar All Cores Tar Pxz 9
Tar All Cores Pxz 9 Time
Tar All Cores Compression Comparison

The compression isn’t that much greater, and the time isn’t necessarily worth it, but if every megabyte counts, it’s still a great option.

I hope you enjoyed this guide to using all cores to compress archives using tar. Make sure to check out some of our other Linux content, like how to build a new PC for Linux, mastering Apt and becoming an Apt guru, and how to install Arch Linux on a Raspberry Pi.

Related:

John Perkins John Perkins

John is a young technical professional with a passion for educating users on the best ways to use their technology. He holds technical certifications covering topics ranging from computer hardware to cybersecurity to Linux system administration.

One comment

  1. This is very interesting. I would have thought that the HDD would be the bottleneck running tar, and depending on the CPU it might be the bottleneck when running gzip and bz2. Maybe now that everyone is running SSDs doing compression in parallel makes sense. Thanks for the tip!

Leave a Comment

Yeah! You've decided to leave a comment. That's fantastic! Check out our comment policy here. Let's have a personal and meaningful conversation.