Sysdig: A Linux Diagnostics Tool
Original post date: May 13, 2014
We’ve mentioned sysdig several times before, but we haven’t published anything in our English-language blog about the program itself. Today, we’ll be pulling an article out of our archives and looking at our original review of sysdig.
Linux systems use a myriad of utilities to collect and analyze data. Each component requires its own specialized tools for diagnosing errors. The most widespread of these diagnostic utilities and their application is visually represented in the following graphic:
We recently learned about Sysdig, a utility developed by Draios. It collects information on absolutely everything:
- incoming network connections and related processes
- I/O-intensive files
- process-related traffic
- files and directories accessed by users
- systems calls, files, and network connections that return errors
Sysdig positions itself as a tool for facilitating the work of system administrators. After reading an article about it on the developer’s site, we decided to test it out.
DTrace, Systemtap, and Sysdig
Sysdig is far from the first attempt to create a tool with extensive data collecting capabilities on a Linux system.
Regarding similar tools in terms of functionality, we should firstly mention DTrace — a dynamic tracing framework developed by Sun Microsystems. It monitors the amount of memory, processor time, and network resources used by a system’s active processes.
DTrace runs scripts written in D (a language similar to C, but which includes specialized functions and modifications for tracers). Scripts include a list of probes, which are responsible for specific actions. Probes activate when a given condition is met (for example, when opening a file or starting a process), and then execute a corresponding action. Information can be transferred from one probe to another.
DTrace is a powerful, but complicated tool and requires extensive technical knowledge on the side of the user. Writing and debugging D-scripts is also a tedious task (especially for those who aren’t particularly skilled programmers), which takes up a lot of time.
A tool very similar to DTrace in terms the principles behind it and available functions is Systemtap. Systemtap is a command line interface and scripting language. It monitors system events and, at the onset of an event, assigns a handler.
The beginning or end of a Systemtap session may be marked by an event like a timer going off. The term “event handler” is given to a sequence of operator scripts that execute when an event starts. Handlers usually break down information from an event’s context or print it in the console.
A major downside to SystemTap is the terribly complicated syntax of its scripting language. Writing and debugging scripts demands a hefty amount of the user’s time and energy.
In contrast to the aforementioned tools, Sysdig has a fairly different structure. In terms of architecture, it more closely resembles programs like libcap, tcpdump, and wireshark. The special driver sysdig probe captures events at the kernel level, which then launches the kernel function tracepoints, which in turn launches a handler for the event. Handlers store information on the event in a shared buffer. This information can then be displayed on the screen or saved to a text file.
Because of this architecture, sysdig does not affect system performance. Detailed information on system events can be retrieved using simple commands. Several operations can even be executed with the use of pre-made scripts written in Lua (we’ll discuss this in more detail below).
Installation
Sysdig is not currently included in official repositories. To launch Sysdig’s automatic installation, we run the following command:
curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | sudo bash
For information on manually installing sysdig on different Linux distributions, see the official documentation.
Update: Sysdig is now officially included in the official versions of Debian and Ubuntu, but as the program is constantly being updated, it’s recommended following the standard installation instructions to ensure you have the latest version to date.
Initial Use
Once sysdig has been installed, we run the following command:
# sysdig
All of the active system events will be displayed in the standard output:
63889 15:25:12.908695644 3 notify-osd (7209) > poll fds=3:u5 timeout=4294967295 63890 15:25:12.908698249 3 notify-osd (7209) writev fd=3(<u>) size=4 63893 15:25:12.908704065 2 gnome-terminal (18260) > lseek fd=24(/tmp/vteIVHGFX (deleted)) offset=0 whence=2(SEEK_END) 63894 15:25:12.908704595 2 gnome-terminal (18260) lseek fd=24(/tmp/vteIVHGFX (deleted)) offset=0 whence=2(SEEK_END) 63896 15:25:12.908709655 2 gnome-terminal (18260) write fd=24(/tmp/vteIVHGFX (deleted)) size=80 63899 15:25:12.908710722 3 notify-osd (7209) > writev res=4 data=+... 63900 15:25:12.908713828 3 notify-osd (7209) < poll fds=3:u1 timeout=4294967295 63901 15:25:12.908714531 2 gnome-terminal (18260) < write res=80 data=1275 15:25:12.596942000 1 rs:main (941) < open fd=-2(ENOENT) name=/dev/xconsole
Each line contains information on one event. This is reflected in the following format:
%evt.num %evt.time %evt.cpu %proc.name (%thread.tid) %evt.dir %evt.type %evt.args
The printout consists of the following fields:
- evt.num — event number
- evt.time — time of event
- evt.cpu — processor where event was captured
- proc.name — processor name
- thread.tid — thread number (for single-thread processors, this matches the processor number)
- evt.dir — direction of event (< — for outgoing processes, > — for incoming)
- evt.type — event type
- evt.args — event arguments
Saving Information to Files
Event information that sysdig gathers can be saved to separate files. This is done using a command like:
# sysdig -w myfile.scap
If you don’t need to write information about all system events to a file, but only a limited number of them (let’s say 100 events), use the -n option:
# sysdig —n 100 —w myfile.scap
You can print out information that was previously saved to a file using the -r option:
# sysdig -r myfile.scap
Sysdig saves a complete capture of the operating system (launched processes, active files, active users, etc.) to each file.
Filters
As we saw in the previous examples, sysdig writes all event information to a standard output. We can make it so that the console displays only the information we need. This is done using filters.
Filters are given at the end of a line (like in tcdump). They can be applied when recording an event on the fly or when writing a file. We’ll try to trace the work of the cat command:
# sysdig proc.name = cat 21368 13:10:15.384878134 1 cat (8298) < execve res=0 exe=cat args=index.html. tid=8298(cat) pid=8298(cat) ptid=1978(bash) cwd=/root fdlimit=1024 21371 13:10:15.384948635 1 cat (8298) > brk size=0 21372 13:10:15.384949909 1 cat (8298) < brk res=10665984 21373 13:10:15.384976208 1 cat (8298) > mmap 21374 13:10:15.384979452 1 cat (8298) < mmap 21375 13:10:15.384990980 1 cat (8298) > access 21376 13:10:15.384999211 1 cat (8298) < access 21377 13:10:15.385008602 1 cat (8298) > open 21378 13:10:15.385014374 1 cat (8298) < open fd=3(/etc/ld.so.cache) name=/etc/ld.so.cache flags=0(O_NONE) mode=0 21379 13:10:15.385015508 1 cat (8298) > fstat fd=3(/etc/ld.so.cache) 21380 13:10:15.385016588 1 cat (8298) < fstat res=0 21381 13:10:15.385017033 1 cat (8298) > mmap 21382 13:10:15.385019763 1 cat (8298) < mmap 21383 13:10:15.385020047 1 cat (8298) > close fd=3(/etc/ld.so.cache) 21384 13:10:15.385020556 1 cat (8298) < close res=0
Let’s apply filters. This can be done using standard comparison operators (=, !=, <, <=, >, >=, contains) or boolean operators (or, and, not) and parentheses.
We’ll enter the following command:
# sysdig proc.name = cat and proc.name = vi
This will trace all of the activities of cat and vi:
56239 12:14:01.449463618 0 BrowserBlocking (2587) > open 56240 12:14:01.449467018 0 BrowserBlocking (2587) < open fd=142(/proc/16213/statm) name=/proc/16213/statm flags=1(O_RDONLY) mode=0 63158 12:14:01.493237287 3 gnome-terminal (3910) > open 63177 12:14:01.493281181 3 gnome-terminal (3910) < open fd=18(/tmp/vteHGSYFX) name=/tmp/vteHGSYFX flags=39(O_EXCL|O_CREAT|O_RDWR) mode=0 63200 12:14:01.493309748 3 gnome-terminal (3910) > open 63205 12:14:01.493319526 3 gnome-terminal (3910) < open fd=18(/tmp/vteHESYFX) name=/tmp/vteHESYFX flags=39(O_EXCL|O_CREAT|O_RDWR) mode=0
The command
# sysdig proc.name!=cat and evt.type=open
will print information about the open events for all processes except cat:
2111 12:15:47.656367409 1 rs:main (914) > open 2112 12:15:47.656368926 1 rs:main (914) open 2114 12:15:47.656371170 1 rs:main (914) open 2116 12:15:47.656374373 1 rs:main (914) open 2118 12:15:47.656376563 1 rs:main (914) open 2120 12:15:47.656378615 1 rs:main (914) open
The full list of filters can be viewed using the command
# sysdig -l
(further explanation and commentary can be found here).
Using filters, we can can easily retrieve critical information. For example, we can view information on incoming network connections received by all processes except apache using a simple command:
# sysdig evt.type=accept and proc.name!=apache
As said above, the sysdig printout contains an evt.arg and evt.rawarg field. We should talk about these separately. Every event registered by sysdig applies to a specific category (such as open, read, etc.) and contains specific parameters (fd, name, etc.), which are coded in a particular way. We’re not going to break down all of this (anyone interested can look at the official documentation), so we’re left with how these arguments can be used when creating filters.
Let’s look at the following command:
# sysdig evt.type=execve and evt.arg.ptid=bash
This displays a list of processes launched by interactive users in the console. This filter catches ‘execve’ system calls (which are used for running programs) only if the parent process is bash.
The difference between evt.arg and evt.rawarg is that the latter doesn’t decode identifying process numbers, error codes, etc., leaving all arguments in a raw numerical form.
For example, we can view a list of all the processes that have caused errors with the following command:
# sysdig "evt.rawarg.res<0 or evt.rawarg.fd<0" 257727 15:57:35.398754060 3 chrome (17326) < futex res=-110(ETIMEDOUT) 257737 15:57:35.399218996 0 chrome (2493) < recvfrom res=-11(EAGAIN) data= tuple=NULL 257749 15:57:35.399362914 1 Xorg (1153) < read res=-11(EAGAIN) data= 257834 15:57:35.401067094 0 chrome (2493) < recvfrom res=-11(EAGAIN) data= tuple=NULL 257836 15:57:35.401106092 0 chrome (2493) < recvfrom res=-11(EAGAIN) data= tuple=NULL 257849 15:57:35.402594284 2 chrome (4446) < futex res=-110(ETIMEDOUT) 257882 15:57:35.407348870 0 chrome (2493) < recvfrom res=-11(EAGAIN) data= tuple=NULL 257884 15:57:35.407358705 0 chrome (2493) < recvfrom res=-11(EAGAIN) data= tuple=NULL 257888 15:57:35.407373908 0 chrome (2493) < recvfrom res=-11(EAGAIN) data= tuple=NULL 257922 15:57:35.407757377 1 Xorg (1153) < read res=-11(EAGAIN) data=
The full list of events and parameters supported by filters can be viewed using the following command:
# sysdig -L
Formatting Printouts
We can customize the the format of a sysdig printout using the -p option and indicating the desired output fields:
# sysdig -p"user:%user.name dir:%evt.arg.path" evt.type=chdir user:ubuntu dir:/root user:ubuntu dir:/root/tmp user:ubuntu dir:/root/Download
Entering the command above collects information on ‘chdir’ system calls (these occur each time the cd command is executed) and displays the name of the user executing the cd command and the directory they switch to in the console.
The -p option uses the following syntax:
- the percent sign (%) is placed before the name of each field
- any text can be added to the line (similar to the printf function in C)
- by default, a line is only printed to the console when all of the elements after the -p option are present. An asterisk (*) at the beginning of a line means incomplete printouts can be given; missing fields will be shown as N/A.
By entering the command
# sysdig -p"%evt.type %evt.dir %evt.arg.name" evt.type=open
we get a printout of information on open outgoing events.
open < /proc/23533/task/23533/stat open < /proc/23533/task/23535/stat open < /proc/23533/task/23536/stat open < /proc/23533/task/23539/stat open < /proc/23533/task/23540/stat open < /proc/23533/task/23541/stat open < /proc/23533/task/23542/stat open < /proc/23533/task/23543/stat open < /proc/23533/task/23544/stat
Incoming events don’t have a name, which is why there is no information on them in the output.
If we enter the command
# sysdig -p "*%evt.type %evt.dir %evt.arg.name" evt.type=open
then the printout will include information on outgoing events:
open < /proc/22832/task/22838/stat open > open < /proc/22832/task/22839/stat open > open < /proc/22832/task/22840/stat open > open < /proc/22832/task/22841/stat open > open < /proc/22832/task/22842/stat open > open < /proc/22832/task/22843/stat open > open < /dev/urandom
Chisels
Sysdig uses small scripts, written in Lua, for analyzing event lists. Developers refer to these as chisels.
A list of available chisels can be displayed in the console using the command:
# sysdig -cl
To view a specific chisel’s description and a list of arguments it can use, we use the -i option:
# sysdig -i fileslower Category: Performance --------------------- fileslower Trace slow file I/O Use the -i flag to get detailed information about a specific chisel Trace file I/O slower than a threshold, or all file I/O Args: [int] min_ms — minimum millisecond threshold for showing file I/O
Chisels are launched using the -c option. We’ll launch the chisel topfiles_bytes (this displays a list of the most accessed files on the local machine):
# sysdig -c topfiles_bytes Bytes Filename ------------------------------ 3.21KB /dev/input/event4 2.93KB /tmp/vte7IZWFX (deleted) 864B /dev/urandom 800B /tmp/vteL7ZWFX (deleted) 498B /dev/ptmx 224B /dev/dri/card0 219B /proc/16213/task/16221/stat 217B /proc/16213/task/16229/stat 217B /proc/16213/task/16219/stat 215B /proc/16213/task/16225/sta
Filters can be used with chisels. If we aren’t interested in information on how frequently files are accessed in the /dev directory, we apply the following filter:
# sysdig -c topfiles_bytes "not fd.name contains /dev" Bytes Filename ------------------------------ 1.90KB /tmp/vte7IZWFX (deleted) 438B /proc/16139/task/16145/stat 438B /proc/16139/task/16141/stat 434B /proc/16139/task/16150/stat 430B /proc/16139/task/16146/stat 430B /proc/16139/task/16147/stat 430B /proc/16139/task/16149/stat 430B /proc/16139/task/16148/stat 428B /proc/16139/task/16139/stat 420B /proc/16139/task/16142/stat
With filters, we can also view information on files accessed in a specific directory:
# sysdig -c topfiles_bytes "fd.name contains /var/log/" Bytes Filename ------------------------------ 596B /var/log/kern.log 596B /var/log/syslog 596B /var/log/messages
Another filter lets us see which files a particular process accessed:
# sysdig -c topfiles_bytes "proc.name=vi"
We can also see which files a user accessed:
$ sysdig -c topfiles_bytes "user.name=username" Bytes Filename ------------------------------ 1.90KB /tmp/vte7IZWFX (deleted) 576B /dev/urandom 384B /tmp/vteL7ZWFX (deleted) 355B /dev/ptmx
We can launch multiple chisels simultaneously:
# sysdig -c stdin -c stdout proc.name=cat
As we’ve already noted, chisels are written in Lua, so existing chisels can be edited and new ones can be written fairly easily.
A manual on writing scripts can be found here.
Practical Examples
Let’s look at a few examples of the standard diagnostic procedures we can perform with sysdig.
Network
To view a list of connections not served by Apache:
# sysdig -p "%proc.name %fd.name" "evt.type=accept and proc.name!=httpd"
To see what data has been exchanged with server 192.168.0.1:
in binary:
# sysdig -s2000 -X -c echo_fds fd.cip=192.168.0.1
in ASCII:
# sysdig -s2000 -A -c echo_fds fd.cip=192.168.0.1
To retrieve information on the processes consuming the most bandwidth:
# sysdig -c topprocs_net Bytes Process ------------------------------ 885B avahi daemon 6.44KB Chrome
To view statistics on server ports:
on the number of established connections:
# sysdig -c fdcount_by fd.sport "evt.type=accept";
on the amount of information sent in bytes:
# sysdig -c fdbytes_by fd.sport
To view information on client IPs:
on the number of established connections:
# sysdig -c fdcount_by fd.cip "evt.type=accept"
on the amount of information sent in bytes:
# sysdig -c fdbytes_by fd.cip Bytes fd.cip ------------------------------ 375B 192.168.40.99 250B 192.168.40.255 226B 192.168.40.101 133B 192.168.30.88 125B 255.255.255.255
To view information on requests sent by Apache to external MySQL servers:
# sysdig -A -c echo_fds fd.sip=192.168.30.5 and proc.name=apache2 and evt.buffer contains SELECT
Disk Subsystem
To view statistics on disk subsystems:
# sysdig -c topprocs_file Bytes Process ------------------------------ 12.61KB BrowserBlocking 3.89KB Xorg 3.79KB Chrome_IOThread 3.09KB gnome-terminal
To view information on file-heavy processes:
# sysdig -c fdcount_by proc.name "fd.type=file" BrowserBlocking 365 Chrome_IOThread 44 irqbalance 12 upowerd 7 dropbox 5 Xorg 3 alsa-sink 2 rs:main 2 compiz 1 rsyslogd 1 gnome-terminal 1
To trace read/write operations performed by processes:
# sysdig -c topfiles_bytes Bytes Filename ------------------------------ 5.41KB /dev/input/event4 1.90KB /tmp/vteHGSYFX (deleted) 576B /dev/urandom 554B /dev/ptmx 384B /tmp/vteHESYFX (deleted) 219B /proc/16139/task/16145/stat 219B /proc/15857/task/15865/stat 219B /proc/16139/task/16141/sta
To view a list of files that Apache runs the most read/write operations for:
# sysdig -c topfiles_bytes proc.name=httpd
To trace file opens in real time:
# sysdig -p "%12user.name %6proc.pid %12proc.name %3fd.num %fd.typechar %fd.name" evt.type=open root 1143 irqbalance 3 f /proc/interrupts root 1143 irqbalance 3 f /proc/stat root 1143 irqbalance 3 f /proc/irq/42/smp_affinity root 1143 irqbalance 3 f /proc/irq/41/smp_affinity root 1143 irqbalance 3 f /proc/irq/16/smp_affinity root 1143 irqbalance 3 f /proc/irq/43/smp_affinity root 1143 irqbalance 3 f /proc/irq/17/smp_affinity root 1143 irqbalance 3 f /proc/irq/23/smp_affinity root 1143 irqbalance 3 f /proc/irq/40/smp_affinity root 1143 irqbalance 3 f /proc/irq/10/smp_affinity root 1143 irqbalance 3 f /proc/irq/18/smp_affinity
Processor Usage
To view statistics on processor usage:
# sysdig -c topprocs_cpu CPU% Process ------------------------------ 0.31% sysdig 0.09% sshd 0.03% mysqld 0.01% nginx 0.01% php5-fpm
To view CPU0 statistics:
# sysdig -c topprocs_cpu evt.cpu=0
To view the standard output for a process:
# sysdig -s4096 -A -c stdout proc.name=cat
Performance and Errors
To view information on httpd file open errors:
# sysdig "proc.name=httpd and evt.type=open and evt.failed=true"
To view statistics on the most time-consuming files:
# sysdig -c topfiles_time Time Filename ------------------------------ 403us /dev/urandom 267us /dev/input/event4 84us /dev/dri/card0 63us /tmp/vte7IZWFX (deleted) 34us /tmp/vteL7ZWFX (deleted) 20us /proc/3467/task/3467/stat 13us /dev/ptmx 11us /proc/16010/task/16010/st
To view information on the processes Apache spends the most time on:
# sysdig -c topfiles_time proc.name=httpd
To view information on processes in terms of I/O errors:
# sysdig -c topprocs_errors ------------------------------ 2363 notify-osd 1327 Xorg 688 compiz 349 chrome 82 pulseaudio 76 gtk-window-deco 62 gnome-terminal 50 alsa-sink 30 Chrome_ChildIOT 20 gnome-screensav 20 nautilus 14 Chrome_IOThread 10 syndaemon 10 gnome-settings- 7 soffice.bin 6 nm-applet 6 dbus-daemon 4 AudioThread 3 pidgin 2 NetworkManager 2 mission-control 1 gdbus
To view information on files in terms of I/O errors:
# sysdig -c topfiles_errors #Errors Filename ------------------------------ 43 /dev/input/event4 2 /dev/ptmx
To view information on system calls that return errors:
# sysdig -c topscalls "evt.failed=true" # Calls System Call ------------------------------ 384 recvfrom 273 futex 169 read 133 sendto 41 select 3 recvmsg
To trace file open errors as they occur:
# sysdig -p "%12user.name %6proc.pid %12proc.name %3fd.num %fd.typechar %fd.name" evt.type=open and evt.failed=true root 1607 upowerd -1 f /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0e/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_now root 1607 upowerd -1 f /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0e/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/energy_avg root 1607 upowerd -1 f /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0e/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/voltage_max_design root 1607 upowerd -1 f /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:0e/PNP0C09:00/PNP0C0A:00/power_supply/BAT0/power_now
To display a list of I/O operations with a latency above 1 millisecond:
# sysdig -c fileslower 1 TIME PROCESS TYPE LAT(ms) FILE 2014-05-13 12:46:57.190 rsyslogd read 3524 /proc/kmsg 2014-05-13 12:46:57.197 rsyslogd read 7 /proc/kmsg 2014-05-13 12:46:57.205 rsyslogd read 7 /proc/kmsg 2014-05-13 12:46:57.209 rsyslogd read 4 /proc/kmsg 2014-05-13 12:46:57.221 rsyslogd read 11 /proc/kmsg 2014-05-13 12:46:57.225 rsyslogd read 3 /proc/kmsg 2014-05-13 12:46:57.233 rsyslogd read 7 /proc/kmsg 2014-05-13 12:46:57.241 rsyslogd read 7 /proc/kmsg 2014-05-13 12:46:58.362 upowerd read 220 /sys/devices/LNXSYSTM:00/LN
Security
To view information on directories visited by the root-user:
# sysdig -p "%evt.arg.path" "evt.type=chdir and user.name=root"
The trace ssh activity:
# sysdig -A -c echo_fds fd.name=/dev/ptmx and proc.name=sshd
To display all file open events from the /etc directory:
# sysdig evt.type=open and fd.name contains /etc 97367 12:50:02.164137993 0 unity-panel-ser (2193) < open fd=13(/etc/timezone) name=/etc/timezone flags=1(O_RDONLY) mode=0 97385 12:50:02.164419642 0 unity-panel-ser (2193) < open fd=13(/etc/localtime) name=/etc/localtime flags=1(O_RDONLY) mode=0 97405 12:50:02.164642935 0 unity-panel-ser (2193) < open fd=13(/etc/localtime) name=/etc/localtime flags=1(O_RDONLY) mode=0
Conclusions
Sysdig is still a fairly young project. Among its undeniable advantages we should name its simple stats command. In many cases, the information sysdig returns on system events is more detailed than DTrace or Systemtap, and it is presented in a more user-friendly format. Another big plus is that the analysis of system processes can be performed after data has been collected, and not only when errors occur or in emergency situations.
Sysdig, without a doubt, has a lot of potential. We hope the project gets the fine-tuning and the merit it deserves among the other Linux system diagnostic tools.