Enabling SSH on ESXi with vRO

By | August 19, 2019

I think it is a good rule of thumb that the security maturity of the organisation can be measured by whether SSH is left enabled (with warnings muted) or disabled on ESXi Hosts. As there is a warning on the host if the service is enabled, and the hardening guides recommend that it is disabled, there is little doubt that this disabled state is VMware’s preference and a security best practice.

So how do we enable SSH on ESXi ? I am assuming that the only accessible interface is the Web Client, as access to the console is likely restricted too – perhaps the hosts are in lockdown mode ? We have a few options..

Web Client

This is a little bit convoluted. We need to follow the steps:

  • Select the host on which to enable/disable SSH.
  • Go to the Configure tab.
  • Select the Security Profile menu option.
  • *Optional: Minimise one or both Firewall sections (Incoming Connections and Outgoing Connections).
  • *Optional: Scroll down to the Services section.
  • Press the Edit button in the Services section.
  • Select the SSH entry in the list.
  • Press Start or Stop.

The * noted items are for vCenter 6.5 or earlier. In vCenter 6.7 the Services panel is moved to the top of the Security Profile section.

The menu to manipulate the service looks as below:

In summary, this is quite a bit of work to enable the service on only one host. Next we’ll look at how to do the same on the HTML5 Client.

HTML5 Client

The steps required to enable/disable SSH on a host using the HTML5 client are as follows:

  • Select the host on which to enable/disable SSH.
  • Go to the Configure tab.
  • Select the Services menu option.
  • Select the SSH entry in the list.
  • Press the Start button

There are fewer steps in the list compared to the Flex client, however this is still onerous when having to work with multiple hosts. The section for changing service state in this client looks as follows:

vRealize Orchestrator

Ah, so we’ve got to the real reason of why I’m writing this post ! With past experience I’ve needed to use SSH to:

  • troubleshoot on a single host
  • perform maintenance/change on every host in the cluster.
  • disable SSH on host as someone else left it enabled and the timeout value hasn’t been set properly.

This identifies that we need workflows to enable and disable the service for a host, and for every host in a cluster.

However, thinking about this, if a host typically has SSH disabled then the shell/ssh enabled warning will not be suppressed – so you will know whether SSH is enabled or not. So, we will also have a workflow that toggles the service between enabled/disabled on a host – purely as a way of presenting fewer options to the end user so the vRealize Orchestrator submenu is kept as trim as necessary 🙂

Our approach to the cluster can re-use some of this – in a best case scenario the service is in the opposite of the desired state for every host in the cluster – i.e. if we want to enable, then the state of every host’s service is disabled. However we must also account for the possibility that the service is enabled on some hosts and disabled on others. Conversely, if the SSH service has been left enabled on a few hosts in a big cluster and we want to disable them, it could be easier to do in a single cluster workflow rather than running a workflow per host. A toggle workflow will not work here – we want to explicitly set the state of the service for all hosts in the cluster – whether enabled or disabled – otherwise we could end up reversing the state. From the point of the user – they know what end state they want, let’s get them there !

This leaves us with one last architecture decision between:

  • A workflow for enable and a workflow for disable.
  • A single workflow with a boolean parameter that defines enable/disable.

Honestly, I don’t think there is much between the two, both are good and reasonable choices. I decided to go with the top option for the reasons:

  • The workflow can be invoked remotely with less parameters.
  • Flex WebClient integration will allow the workflow to be immediately run with less clicks.

Building the workflow

The following code block will enumerate the services available on the host, and if the SSH service is not running, start it.

Conversely, to stop the service, most the above is identical, except for the innermost conditional block:

Our toggle code can be written as:

The workflows for the cluster follow a typical pattern:

  • Get all the hosts in the cluster
  • Call the appropriate workflow for each host

The workflow schema looks as follows:

As can be seen, we use a built-in action to get all the hosts in a cluster, and the builtin workflow loop object to run a workflow for every host in the array.

In the workflows I also check the connection state of the host – there is no point trying to change the service state on a host that is disconnected !

Download

If your browser renames the file, you are free to change the file extension to “.package” (the import may still work if you do not). The file is an exported vRO package (from version 7.5) and is ready for import onto your server.