System monitoring means many different things to many different people. A manager may look at monitoring as a way to measure uptime. As a sysadmin, I look at monitoring as the following:
Xymon is a client/server software package that can be used to monitor systems, services, and events. It can monitor passively or actively. It can produce alerts. It can be easily extended.
Xymon is a collection of C programs and shell scripts. The server is primarily C code. Xymon server is written as a number of different daemons running concurrently. Information is passed amongst them similar to an event progressing through an OOP application. The message bus is implemented as shared memory (making the code blinding fast and low overhead -- really!). Client code is mostly shell scripts with the exception of the C program that sends data back to the server.
Xymon uses static and dynamic web pages as the primary mechanism to view state.
The server does certain basic tests locally such as ping, ssh, http, ... If these are all you care about then you do not need to install any client software on the system to be monitored. If you install the Xymon client code on a machine, it provides a number of default monitors (cpu, disk, files, inodes, memory, msgs, ...).
The server can be configured to react when preset levels are reached. For instance, it can be set to send an e-mail when a file system hits 80% full or to send an SMS message when a node has not been able to be pinged for more than 5 minutes.
Most of the above is not unique among monitoring packages (other than the lightweight property). Two things make Xymon special: ease of adding new tests, simplicity of test concept. Before we go any farther, lets look at how Xymon came to be.
Many years ago (1996), a man named Sean MacGuire was making a living as a consulting system admin for a company that paid him adequately. The company had grown enough that it had more than a couple of servers. It dawned on them that they needed a network monitor to make sure all the important servers and services were up. They had another consulting company come in and propose a solution. The price was very high (in Sean's opinion). So he decided to write a simple monitoring solution for the company. This was the birth of Big Brother.
Although Sean provided it to the company, he retained rights to it. As Sean developed it, other folks asked if they could use it. Sean developed an odd license for Big Brother. BB was free to use to help you, but if you were making money because of BB, then he wanted a cut. Remember he wrote, he can set the license.
The original BB was all shell script except for the listener (on the server) and the transmitter (on the client). They were written in C. Sean took great pains to write portable shell scripts. Because of this BB gained quite a following.
BB used the web as its primary method of showing current state. In 1996, dynamic html was still very cpu intensive. Sean chose to regenerate static pages every 5 minutes rather than use dynamic html. This allowed BB to be relatively lightweight while still maintaining a reasonably up to date view.
BB provided an easy to understand basic concept that used to colors to quickly communicate. The state of a service can be one of the following:
These simple colors displayed as a matrix (machines for rows and test names for columns) proved to be a great concise way to convey information. BB also included a special page named "nonGreen". This page was the entire monitored matrix with all rows and columns collapsed out that do not have a red or yellow on them -- a single easy to read web page of everything that is not right.
BB also included the idea of extensibility. Tests (columns) do not have to be created on the server side. As soon as a test result is received, the server dynamically adds test. Even better, adding a test is as easy as the following:
What could go wrong? Turns out success went wrong. As Big Brother usage grew, a number of issues developed. Many tests were developed for BB. Many of them were shared. Some folks are better at writing portable code than others. Because the additions were not as solid as the base code, some thought BB was flawed. Questions of license abuse came up. On bigger installations, the code to regenerate the web pages was to slow.
A separate repository to house plugins helped some. A mailing list also helped clean up the plugins some. The license issues continued. The performance issue was resolved when Henrik Storner (a heavy user of BB) wrote a tool called bbgen. bbgen was a drop in replacement for the original scripts that generated the web pages that was written in C. It was fast.
BB drew enough attention that Quest Software approached Sean MacGuire. After negotiations, Quest bought BB. Part of Sean's negotiation was that BB would remain freely available. All was well for a while. Over time, the free version languished. Quest made it harder to get and limited what could be moved down to it.
During this time, Henrik became frustrated. bbgen was a significant part of BB. He understood the inner workings of BB. He decided to write a BB clone around bbgen. The initial name for his clone was Hobbit. As Hobbit evolved, Henrik shifted the server to C while retaining the script nature of the client. In fact Hobbit was designed to listen to BB clients without modification. As Hobbit matured, its server design was heavily modernized while remaining written in C for portability and performance.
Finally Hobbit reached production status and was announced. Just as Hobbit gained traction a letter from the Tolkien estate insisted that the name be change. The project was renamed Xymon. Over the next few years almost all traces of the word hobbit have been removed.
A year or two back, Sean MacGuire parted ways with Quest software. He was never happy with the removal of BB from public access. He has posted a few times on the Xymon mailing lists and enthusiastically endorses Xymon.
Xymon has a lot of different parts. Html is a big part of the common user interface. Use of RRD and graphs provide history and quick visual trending. Alerting via e-mail, SMS pages, and scripting provide incredible versatility. Replaceable icon sets allow the look to be updated.
Xymon has three main sections: server, client, history. The server code is in the server tree. The client code is in the client tree. The history tree is used to store historical data for the server. All configuration and data are kept in text files -- no databases used here.
Shared memory communications provides a fast reliable interconnect. Shell scripting for the client side eases portability and adaptability. Plugin architecture makes expansion easy.
By default Xymon needs the following:
I have described Xymon as lightweight. My instance of Xymon claims 511 hosts, 26 web pages and 2045 tests. My server is an IBM xSeries 336 (dual 3.6 GHz pentium processors, 2 GB RAM). Its load average stays below 0.5.
Xymon uses a rirectory structure modelled after the normal Unix/Linux tree structure. Here is breakdown:
Xymon clients only need a small amount of software to be installed to work. The client tree as listed above is pretty straight forward. The only interesting places are etc and ext.
Even though etc has several files in it the only file that is currently used is clientlaunch.cfg. This file tells the client how/what to start and how often to rerun it. Usually, the last line says to include anything in clientlaunch.d (where you put config to start external tests).
ext is a directory where you put the scripts that you wish to have the client execute. The scripts here are run as the xymon user and inherit its environment.
The server has the same basic config as the client and most things mean the same things. The server comes with a few extras: web and www. When Xymon produces web pages (www) it does so by processing the configurations files and using web snippets (web). www becomes the place that apache is configured to use as webroot.
Most of the time spent configuring Xymon will be in the server/etc directory. hosts.cfg will need to be updated every time hosts appear/disappear or services to monitor change. Most of the web layout is defined here also.
Here are some useful links: