How to Perform Load Monitoring in Linux Using atop

Load monitoring is one of the most critical tasks, especially if you’re dealing with servers. It not only gives you an idea how your system’s resources are being utilized but also helps you diagnose performance-related issues. In this article we will discuss how to perform load monitoring in Linux using the atop tool.

Note: all the examples used in the article are tested on Ubuntu 14.04.

As per its man page, the command line tool atop is an interactive monitor to view the load on a Linux system. It shows how your system’s hardware resources, including cpu, memory, disk, and network are occupied, all from a performance point of view. Not only this, but it also shows which processes are responsible for the indicated load.

Note: disk load is shown if per process “storage accounting” is active in the kernel or if the kernel patch “cnt” has been installed. Similarly, network load is only shown per process if the kernel patch “cnt” has been installed.

Users of Debian-based systems (like Mint and Ubuntu) can download and install the tool using the following command:

sudo apt-get install atop

Those who are on other Linux distributions can use their respective package management tools. For example, yum in case of Red Hat. You can also download the tool from its official website.

Once installed, you can execute the tool by executing the following command from the command line:

sudo atop

Here is the sample output:

atop generic output screen.

As you can see, that’s a lot of information which is broadly divided into two parts: System level and Process level. The former consists of the following output lines:

PRC: This line contains the total CPU time consumed in system mode (‘sys’) and in user mode (‘user’), the total number of processes present at this moment (‘#proc’), the total number of threads present at this moment in state ‘running’ (‘#trun’), ‘sleeping interruptible’ (‘#tslpi’) and ‘sleeping un-interruptible’ (‘#tslpu’), the number of zombie processes (‘#zombie’), the number of clone system calls (‘clones’), and the number of processes that ended during the interval (‘#exit’, which shows ‘?’ if process accounting is not used).

CPU: This line contains the percentage of CPU time spent in kernel mode by all active processes (‘sys’), in user mode (‘user’) for all active processes (including processes running with a nice value larger than zero), for interrupt handling (‘irq’) including softirq, as well as the percentage of unused cpu time while no processes were waiting for disk-I/O (‘idle’) and while at least one process was waiting for disk-I/O (‘wait’). In case of a multi-processor system, an additional line is shown for every individual processor (with ‘cpu’ in lower case), sorted on activity.

CPL: This line contains CPU load information – the number of threads that are available to run on a CPU (i.e. part of the runqueue) or that are waiting for disk I/O, the number of context switches (‘csw’), the number of serviced interrupts (‘intr’) and the number of available CPUs.

MEM: This line contains information related to memory consumption — the total amount of physical memory (‘tot’), the amount of memory which is currently free (‘free’), the amount of memory in use as page cache (‘cache’), the amount of memory within the page cache that has to be flushed to disk (‘dirty’), the amount of memory used for filesystem meta data (‘buff’) and the amount of memory being used for kernel malloc’s (‘slab’).

SWP: This line contains the total amount of swap space on disk (‘tot’) and the amount of free swap space (‘free’), the committed virtual memory space (‘vmcom’), and the maximum limit of the committed space (‘vmlim’)

DSK: This line contains information related to disk utilization — the portion of time that the unit was busy handling requests (‘busy’), the number of read requests issued (‘read’), the number of write requests issued (‘write’), the number of KiBytes per read (‘KiB/r’), the number of KiBytes per write (‘KiB/w’), the number of MiBytes per second throughput for reads (‘MBr/s’), the number of MiBytes per second throughput for writes (‘MBw/s’), the average queue depth (‘avq’) and the average number of milliseconds needed by a request (‘avio’) for seek, latency and data transfer.

NET: This is the information related to Network utilization (TCP/IP) — one line is shown for activity of the transport layer (TCP and UDP), one line for the IP layer, and one line per active interface.

The system level information is followed by process level information, which as the name suggests details information related to the processes from which the resource utilization has changed during the last interval (a default interval is 10 seconds).

An important point worth mentioning is that atop uses colours (red, cyan, and more) to indicate the criticality of the resource consumption on system level. For example, when a resource exceeded its critical occupation percentage, the entire screen line is coloured RED.

Note: go through the command’s man page for more details on the output of the command.

You can control the output of the atop command from your keyboard. For example, press m to show memory related output, d for disk-related output, n for network related output, v for various process characteristics, c for the command line of the process, etc.

Here is the screen-shot of the process level information produced by the atop command when c was pressed:

Process level information.

So as you can see, the command line of the process is displayed in the output.

Atop is a very useful load monitoring command in Linux that not only provides a bucket-load of information about system resources but also various ways to customize and control its output. You are recommended to go through the command’s man page to learn more about it.