Black box monitoring of Windows servers

Black box monitoring of Windows servers
Photo by Sigmund / Unsplash

This is part 2 in a series about how healthchecks.io can be used together with Qlik Sense Enterprise on Windows (QSEoW).

Previous articles:
Oh data where art thou?

Windows Server just works… right?

Usually, yes.
Still, things do happen and it certainly would be nice to get an push alert when a server hasn’t checked in according to schedule.

The most common monitoring – for Windows/Linux/…. servers, databases, Qlik Sense etc is based on the tool keeping an eye on some measurement and then alert when the measurement goes beyond some threshold.
This is fine, and this is a very important monitoring use case. But in cases where a server just hangs the last measurement received might be fine, and no alerts are sent.

Black box monitoring kind of reverses the roles:

The monitored system has to prove that it’s doing fine. Failing to do so within some predefined schedule will trigger an alarm, with an optional alert being sent.

The previous article showed how this concept can be used to ensure that some Qlik Sense app has reloaded as intended, before a specific time each day. A concrete, common use case would be that yesterday’s data should be processed and loaded into Sense before 7 am next day. Alert if not)

Now, let’s use the same tool and concept to monitor also the Windows servers that Qlik Sense Enterprise on Windows (QSEoW) runs on.

Once again, healthchecks.io is used as the monitoring tool.

This time the monitored systems will be three Windows servers, on top of which a Qlik Sense Enterprise cluster runs.

The video above best shows what happens, with the text below providing some details.

1 – The basics

One of the monitored servers, “pro2-win3″, goes offline. As this is the main access server in the Sense cluster, users working in Qlik apps will certainly notice that something happened.

There are of course many ways a sysadmin or Sense platform admin could be notified that something happened, the beauty of the concept shown here is the simplicity. Easy to understand, setup and operate – yet powerful enough for most scenarios.

The goal here is to have the monitoring tool detect that the server has fallen over and send alerts to the sysadmin.

2 – Windows Task Scheduler

The standard Windows Task Scheduler is used to ping the monitoring tool every 15 minutes. The “ping” consists of the server somehow visiting a specific URL that has been defined in the monitoring tool. Pinging that URL tells the monitoring tool that pro2-win3 is online. Applications (such as Sense) on that server may still be malfunctioning, but at least the server is up.

3 – curl or PowerShell – you choose

Somehow the Windows Task Scheduler must call a program that makes a simple visit to the specified URL.

This can for sure be achieved using PowerShell, in this example I however used curl, which is THE tool for doing http requests from the command line, no matter what platform you work on. There is a stand alone Windows version available.

4 – Summary

That’s it, really. healthchecks.io will check the time passed from last ping from pro2-win3, and if it’s been more than 15 minutes (as configured in healthchecks.io) plus a few minutes grace time since the last ping, an alarm (and optional alert) will be triggered.

Alerts can be sent to lots of different tools: