Butler SOS 10.2 released

Butler SOS 10.2 focus on making it easier to monitor large Qlik Sense environments. New major features include (very!) flexible categorisation and visualisation of Qlik Sense log events, as well as visualisation of Butler SOS' config file.

Butler SOS 10.2 released
Which warnings and errors are important and which ones are just noise? Butler SOS can categorise these events, making it much easier to find the important ones.

Butler SOS is a powerful monitoring and alerting tool designed specifically for the Qlik Sense Enterprise ecosystem.
It adds unique features that are not included with Qlik Sense itself - features that are however very much needed to establish enterprise grade monitoring and alerting for Qlik Sense.

Butler SOS has been around for quite a few years and is used in both small and large Sense environments. The nature of large Sense environments can be challenging though, with dozens of servers in various configurations.

It simply becomes tricky to keep track of what configuration parameters are used where and also to stay on top of the massive volume of metrics and log warnings/errors that indicate something is not quite right.

The latest version of Butler SOS focus on making life easier for the teams responsible for large Sense environments.
By the way, these features also valuable for smaller environments, of course.

Butler SOS has its own site at https://butler-sos.ptarmiganlabs.com.

New features

The 10.2 release has a set of bug fixes and minor feature updates, the main news are however the following:

Categorisation of Qlik Sense log events

When first deploying Butler SOS in a new Sense environment it is quite common that the first insight is that there are more warnings and errors in the Sense logs than expected.

Actual warning/error log messages in a single hour in a medium-sized Sense environment.
How do you know which messages are important and which are just noise?

This is a critical question, because if you can answer it you suddenly have an actual chance to fix those serious issues, and just disregard the others.

So how can we do this?

  • Butler SOS 10.2 lets you define rules that are applied to all Sense log events/messages that Butler SOS receives.
  • These rules can result in either of two actions: The message being associated with one or more categories, or the message being dropped.
  • The rules contains one or more filters that are matched against incoming log events. If a filter matches a log event the filter's rule's categories are associated with the event.
    • There are currently 3 filter types that all act on strings. Conveniently, these are the same as those used by the Qlik Sense Repository API:
      • sw: starts with
      • ew: ends with
      • so: substring of
  • The categories can be forwarded with the event to storage providers supported by Butler SOS, for example InfluxDB. Support for others (New Relic, Prometheus etc) may be added in the future.
    • In the case of InfluxDB, the categories will be added to the log event datapoint as regular InfluxDB tags.

The rules are defined in the Butler SOS config file. Can look like this:

Butler-SOS:
  ...
  logEvents:
    ...
    categorise:                        # Take actions on log events based on their content
      enable: true
      rules:                           # Rules are used to match log events to filters
        - description: Find access denied errors
          logLevel:                    # Log events of this Log level will be matched. WARN, ERROR, FATAL. Case insensitive.
            - WARN
            - ERROR
          action: categorise           # Action to take on matched log events. Possible values are categorise, drop
          category:                    # Category to assign to matched log events. Name/value pairs. 
                                       # Will be added to InfluxDB datapoints as tags.
            - name: qs_log_category
              value: access-denied
          filter:                      # Filter used to match log events. Case sensitive.
            - type: sw                 # Type of filter. sw = starts with, ew = ends with, so = substring of
              value: "Access was denied for User:"
            - type: so
              value: was denied for User
        - description: Find AD issues
          logLevel:                    # Log events of this Log level will be matched. WARN, ERROR, FATAL. Case insensitive.
            - ERROR
            - WARN
          action: categorise           # Action to take on matched log events. Possible values are categorise, drop
          category:                    # Category to assign to matched log events. Name/value pairs. 
                                       # Will be added to InfluxDB datapoints as tags.
            - name: qs_log_category
              value: user-directory
          filter:                      # Filter used to match log events. Case sensitive.
            - type: sw                 # Type of filter. sw = starts with, ew = ends with, so = substring of
              value: Duplicate entity with userId
        - description: Reload task failed
          logLevel:                    # Log events of this Log level will be matched. WARN, ERROR, FATAL. Case insensitive.
            - WARN
            - ERROR
          action: categorise           # Action to take on matched log events. Possible values are categorise, drop
          category:                    # Category to assign to matched log events. Name/value pairs. 
                                       # Will be added to InfluxDB datapoints as tags.
            - name: qs_log_category
              value: reload-failed
          filter:                      # Filter used to match log events. Case sensitive.
            - type: sw                 # Type of filter. sw = starts with, ew = ends with, so = substring of
              value: Task finished with state FinishedFail
            - type: sw                 # Type of filter. sw = starts with, ew = ends with, so = substring of
              value: Task finished with state Error
            - type: ew                 # Type of filter. sw = starts with, ew = ends with, so = substring of
              value: Reload failed in Engine. Check engine or script logs.
            - type: sw                 # Type of filter. sw = starts with, ew = ends with, so = substring of
              value: Reload sequence was not successful (Result=False, Finished=True, Aborted=False) for engine connection with handle
      ruleDefault:                     # Default rule to use if no other rules match the log event
        enable: true
        category:
          - name: qs_log_category
            value: unknown

Now, the data has been categorised and stored in InfluxDB (v1 and v2 both supported), with the categories added as tags.

The Grafana dashboard can now be modified to show categorised and uncategorised events separately.

It's then possible (trivial!) to examine both important categorised events as well as the uncategorised, unknown ones.

Can look like this (smaller number of events here):

Categorised events to the left, uncategorised to the right.

Note how a detailed view of uncategorised log entries are shown in the bottom right table, including warnings (orange) vs errors (red) and number of times each event has occurred.

Bottom line: Your life as Qlik Sense admin suddenly became a lot easier.

Visualisation of config file in use

The other new feature may at first look just be eye candy, but it's actually pretty useful, especially for large Sense environments. It's then a bit easier to pull up a nice web page compared to scrolling through endless YAML config files.

Butler SOS now includes a built-in web server that serves a web page in which the current configuration is shown:

Butler SOS config file visualised by... Butler SOS

This feature can be enabled/disabled as needed in the config file.
The config file also includes an option to obfuscate/scramble sensitive items in the config file before it is shown in the web page. Examples include account names, passwords, host names etc.

A word of warning

A final heads-up is also relevant.

The entire Butler SOS source code was updated in 10.2.0, as part of switching to a more modern way to creating Node.js applications.

This change is 100% behind the scenes and should not be visible in any way to users of Butler SOS - but...

...when doing major updates there is always a risk of bugs being introduced.

So please keep an eye open for strange things and report any issues over at GitHub. Thanks!