vRealise Log Insight and Orchestrator

By | September 15, 2017

I’m a big fan of the vRealize suite as it really adds value to a vSphere deployment. In relation to this post, LogInsight is a great tool for providing log analysis and Orchestrator manages workflows to automate and orchestrate most/all of the infrastructure.

A series of posts  on vmware.com – “Self Healing Datacenter” – have examined how to connect Operations Manager to Orchestrator using a shim – the first part of the series is here. The shim is also capable of taking input from LogInsight, though this isn’t covered by the articles.

That webhooks are offered for LogInsight is great, but it is sad that there is no official tool to send events/data from one product in the vRealize suite to another. I’m interested in how and why things work (and also because it doesn’t appear to work quite right), so instead of using the shim presented in the series above I’m going to write my own in node.js so we can look at how this all works.

Environment

In this environment we’ll be using vRealize LogInsight 4.3 and Orchestrator 7.2, both which have a straight forward and simple deployment – both are single servers (no clusters !) and both are configured to a vSphere 6.5 deployment.

In vSphere we create a nested host to test against, and have put it in its own cluster. I’ve done this because some of the actions I want to take involve shutting the host down (not always gracefully) and I don’t have the spare physical hosts to do this (e.g. the auto-esxi VMs below are the hosts in the Nested cluster).

VMs acting as hosts in the same vCenter

LogInsight has been set up with vSphere integration – all hosts (including the nested host) have been configured to send syslogs to the LogInsight address. A look at the “Hosts” page in the LogInsight Administration section shows that logs are being successfully received:

Log Insight Hosts view

Integration

It isn’t actually possible for LogInsight to start Orchestrator workflows, the closest we can come is by using the “webhook” functionality when setting up an alert – this will make an HTTP request to an URL with some information when something happens (below is an example of our LogInsight query for a host going into maintenance mode).

LogInsight maintenance mode event

It is because the vRealize Orchestrator API requires some additional parameters that we can’t start Workflows, and when we look at the schema on the swagger API reference page (https://<address>:8281/vco/api/docs/index.html – where <address> is the FQDN/IP of your Orchestrator server), we can see what is required:

WebHook Shims

The shim referenced in the “Self Healing Datacenter” series can be found as the “log insight webhook-shims” on github (this also works with vROPs). To test this I deployed a Photon OS instance and installed it via docker:

  • systemctl enable docker
  • systemctl start docker
  • docker pull vmware/webhook-shims
  • docker run -it -p 5001:5001 –name lishims vmware/webhook-shims bash

The public page for this docker container shows how to set it up – https://hub.docker.com/r/vmware/webhook-shims/ – and after this we need to configure the vRealize Orchestrator connection in the container. At the command line, enter the following:

  • vi loginsightwebhookdemo/vrealizeorchestrator.py:

When this file is saved we can start the server within the container:

  • ./runserver.py

As per the INFO message above, if we browse to http://<address>:5001 (where <address> is the FQDN/IP of the server running the shim) we get a guidance screen on how to use the software – scroll down to view the advice on how to integrate with vRealize Orchestrator.

Now we can return to our LogInsight query and set up an alert based upon it. Select the Alert button on the UI and the following window will appear:

Select the Webhook notify option and put in the formatted address of the webhook server:

  • http://address : port / endpoint / vro / workflow-id

Aha, we don’t have a workflow set up so we don’t know what worfklow-id to type in here – let’s do this now. Open up the vRealize Orchestrator client and create a new workflow – we’re going to keep this one simple and just write something in the log, the schema for which is below:

When we look at the details of the workflow we see the ID to use the alert:

So now we have a workflow id we can create our LogInsight alert. Go back to the LogInsight alert configuration page, update the webhook and URL and save.

Workflow problems

Time to put a host (that is configured to send syslog to our LogInsight server) into Maintenance Mode – which I did I 0753 in the screenshot below:

We see the following workflow implementations:

Woah – we only put the host into maintenance mode once and the workflow was started 10 times – what could have happened ? The sequence of what happens is:

  • LogInsight matches the query and notifies shim server via webhook.
  • Shim server receives request and formats a vRealize Orchestrator API call. Sends to vRealize Orchestrator
  • vRealize Orchestrator starts the workflow.

If you were watching the console of the Photon OS container then you would have seen that for every workflow invocation a new request was sent by the webhook shim code. So this excludes vRealize Orchestrator from causing this, but is the shim code receiving a new notification for every workflow ?

I did toy with the idea of setting up port mirroring via a distributed switch and using Wireshark to check only one request was sent, but this seems unnecessarily complex.  The is another way 🙂

Node.js

We can spin up a very simple node.js server that logs out every request that is made to it. Our server code looks as follows:

When we put a host into maintenance mode now we can see that we only receive one email and that our notification monitor only receives one request from Log Insight. So the problem is somewhere in the shim server. But in the spirit of creating stuff and learning something new, let’s write our own shim server !

The main body of our server function is:

This is really straight forward and will receive the full request and call handle_request (which I’m not going to detail  here), which will then invoke the call_vro function with appropriate parameters.

In our call_vro function we will need to set up the parameters for the Orchestrator call:

The code above sets up the basic authentication string and defines the parameter array for the request. We are going to provide three parameters for the workflow:

  • messages – the messages passed by LogInsight. These will be Base64 encoded so Orchestrator will need to decode them.
  • alertName – the name of the alert that was configured in LogInsight.
  • hitCount – the number of times this alert was triggered.

As our alert should be triggered on every invocation we won’t worry about hitCount for now – but we should make a note to test this later.

After the parameters have been set up, they are convered to a JSON string for the request. We can see below in the request header set up that we are declaring our content is JSON.

This code sets up the connection, defines the functions for what happens when the request errors or is complete and then makes the connection (https.request at the bottom). If we receive an error code of 202 then all is good !

Mental note: current status codes for starting a workflow:

Trying it out

When we put a host into maintenance mode now we can see that we only get one Orchestrator workflow generated for each alert. But if we look at Log Insight, the alert is examined every 5 minutes:

So what happens if we put two hosts into maintenance mode before the alert is examined ?

This is the console of the server:

Looking at the request that has been passed we can see that there are messages for both hosts, but we only have one workflow started:

We need to look at how we pull out the information contained in the ‘messages’ input variable. First, we need to download and install a plugin that will allow us to decode base 64 strings. This can be obtained from  https://github.com/vmware/o11n-plugin-crypto and installation is straight forward from the Control Center. After installation, the vCO service needs to be restarted (not the server) – the message stating the service needs to be restarted doesn’t disappear immediately, but it does go 🙂

Then we need to modify our workflow to accept these parameters:

I’ve also added an attribute to the workflow with the name “hosts” and type “Array/string” – but you can omit this and comment out the lines that reference this attribute. We can now add a scriptable task with the following:

This is what we see now in the Logs section for the workflow when we put two hosts into maintenance mode for the same alert interval – we can now handle more than one host, and identify which host has gone into maintenance mode.

Summary

So we can now decode the messages we receive from LogInsight and work with multiple messages – I’ve left the node.js console printing out the contents of the request (line 34) in order to help identify useful fields of information. This needs some tidying up to be useful (and a solution for storing username/passwords in the node.js file should be found) but hopefully is a good proof of concept of what is capable.

As usual, the code is available on github – https://github.com/nelons/loginsight_webhook_shim. Feel free to use/abuse in any way you like, let me know if I’ve made a mistake or if you found this useful.