Thursday, November 9, 2017

A Short Story on “How I saved my weekend”

After a network re-design, I started realizing that the we need to change the “vlan” of all the virtual servers we created in two F5 BigIP LTM Devices…. Sadly, we have 311 Virtual Servers in each F5 and we have two devices in scope. So, the count bumped to 622… 3 clicks per virtual server and am going click 1866 times???

No... Never…

Lazy minds always find shortcuts and I started thinking about the leadership direction, automate as much you can…

Finally, after couple sighs, I started dusting my Python skills and wrote couple of lines of code to finish my playbook, named it as ‘knock-it-down.yml’ …

Yes, it took only 45-50 mins for me to save 1866 clicks and my plans during the weekend…

It was painful to accept we lost the game, but there was a smile in my face and I was trying to hide it…

Why "fw ctl zdebug..." should not be used"

Kernel debug Best Practices or "Why "fw ctl zdebug..." should not be used"

Over last several days I have seen rapidly growing amount of posts at CPUG and CP Community where "fw ctl zdebug..." command was mentioned, used and advised.

Although some of you already know my position for the matter, I have decided to write a post about the growing custom to use zdebug instead of employing full fw ctl debug mechanism.

Kernel debug in general


Check Point FW is essentially a Linux-based system with a kernel module inserted between drivers and OS IP stack. If you do not know what I am talking about, you may want to look into this post with an explanatory video for the matter.

Extracting information about kernel based security decisions is rather tricky, so Check Point developed an elaborate tool to read some info about various FW kernel modules actions.

In a nutshell, each kernel module has multiple debug flags that force code to start printing out some information. I have numerous posts in this blog explaining different flags, tips and tricks with kernel debug and also providing links to CP kernel debug documents.

Debug buffer


It is important to understand FW kernel is always printing out some debug messages. For most of the kernel modules, error and warning flags are active, and the output goes to /var/log/messages by default. This is not practical for debug, so before starting kernel debug, an engineer needs to set a buffer which would receive debug output instead of /var/log/messages file.

To do so, the following command is used: fw ctl debug -buf XXXXX, where XXXXX is the buffer size in KB. The maximum possible buffer today is 32 MB, but I advise my students to use 99999 to make sure they get maximum buffer possible anyway.

Kernel can be very chatty, so having a bigger buffer would ensure less kernel messages being lost.

Debug modules and flags


FW kernel is a complex structure. It is built with multiple modules. Each of the modules has its own flags. One can run a single debug session with multiple flags raised for several modules. To raise debug flags, one use one or several commands of this type:

fw ctl debug -m (module name) (+|-) (list of flags)

It is essential that + and - options allow you to raise and remove flags on the fly, even during an already running debug session. List of modules and flags can be found by the first link in this post.

Printing info out of buffer


Raising flags is not enough, as to get information, you need to start reading buffer out with this command:

fw ctl kdebug -f (with some options)

There will be A LOT of information, so never do this on the console. Use SSH session or redirect to a file.

Stopping debug


Once you collected the relevant info, you need to reset kernel debug to the default settings, otherwise you FW will continue printing out tons of unnecessary info. To do so, run

fw ctl debug 0

What is fw ctl zdebug then?

fw ctl zdebug is an internal R&D macros to cut corners when developing and testing new features in the sterile environment. It is equivalent to the following sequence of commands:

fw ctl debug -buf 1024
fw ctl debug (your options)
fw ctl kdebug -f
-------(waiting for Ctrl-C)
fw ctl debug 0

Why is this a problem?


If you are still reading this post and get to this line, you probably think zdebug is a god sent miracle. It simplifies so many things, it is the only way to run debug in production environment! Right? 

Wrong. To make it plain, here is the list of problematic point with this way of doing things:

1. The buffer is way too small. Lots and lots of messages might be just lost because buffers does not have enough room to hold them before read.
2. It is not flexible enough. Running debug in production requires lots of consideration and certain amount of caution. After all, you are asking FW kernel to do extra things, lots of them. The best practice is to start with a single flag or two and expand area of research in the fly trying to catch an issue. This is impossible to do with fw ctl zdebug macros.
3. It is too simple to use. You could say, what a funny argument. Yet, let's think about it. To master kernel debug as described above, one has to understand kernel structure, dependencies, flags and modules. You don't have to do any of that to run fw ctl zdebug drop, and many people do jsut that. 

And guess what, this is also the simplest way to bring your busy production FW cluster down. So no, do not try this at home or at your place of work, if job security is important for you.