Managing Snapshots with vRO

By | February 25, 2019

One of the main issues that face departments on their virtualisation journey is how they are managing snapshots:

  • Who can take them
  • Who can delete them
  • How long do they stick around for
  • Name/Description formatting policy

I consider them ‘low hanging fruit’ on the management tree – how an organisation approaches this is defined by policy and there shouldn’t be anything too contentious involved. There are also options to have the snapshots deleted automatically, with most management software providing the function. If you don’t have this, then a vSphere alarm can be created to warn on snapshots sizes.

There was no option to automatically remove snapshots that was in accordance with the policy that was agreed at a previous role, so it was time to automate something :). One of the items revolved around communication to the snapshot owner – it is very important to inform them that the snapshot will be automatically removed and when it has been. Another item was where some snapshots that might be automatically cleared up would need to be held for longer – perhaps as a part of a Root Cause Analysis investigation.

Requirements

From the above, the requirements are this:

  • Inform the user who took the snapshot that it will be deleted in X days, daily until deleted. (X should be configurable).
  • Inform the user that the snapshot has been deleted.
  • Report to a team/email address details on all snapshots present and estimate the space use.
  • Exclude Virtual Machines from having their snapshots automatically deleted.

These will mean that:

  • The workflow will always produce a report. Therefore our input values should include an email address.
  • A means of identifying a Virtual Machine as being excluded. We could supply a list, or a naming format to match, a tag or a custom attribute.
  • Given that we can retrieve the username of the person who created the snapshot (it is contained within the event), we still need a means of identifying the email address of the user.

My preference for the second item above is to use a tag as we can retrieve VMs easier from a management perspective.

The identity source for the vCenter is assumed to be Active Directory, thus we would have the further assumption that no local accounts are allowed (or only in exceptional circumstances) and will be taking snapshots. We can therefore use the Active Directory plugin to query for the user and retrieve the email address. However we should also handle the edge case that the user does not have an email address set, or it is not valid – therefore we will issue warning emails to an appropriate team that the end user was not/cannot be warned of the snapshot expiry.

In summary, our input values are:

  • # Days when start to warn end users of deletion.
  • # Days when the snapshot should be deleted.
  • Email address in case of errors.
  • Email address for the report.
  • Tag name for VMs to be excluded from snapshot management.
  • A ‘WhatIf’ to control whether action is taken.

Our vRO server will also need to have the Active Directory plugin active and configured.

We also need to deal with another assumption – there can be more than one snapshot on a VM, and multiple snapshots may need to be deleted on each day.

[Hindsight]
After creating the workflows I added another couple of inputs:

  • Minimum age of VM snapshot to appear on the report.
  • Subject of the report.

Workflows

I write vRO workflows in a decentralised manner – for example if I am to do something with a Virtual Machine I have a workflow for:

  • The Virtual Machine
  • The Cluster
  • (Sometimes) The Datacenter
  • vCenter Server

This can add up to a lot of workflows, but it allows for testing and iterating changes easily. I will often put a “WhatIf” input on the Virtual Machine workflow so that I can test the workflows that function at a higher level – this ties in with the identified inputs earlier. So in this case, the workflow design mirrors the above (but without a datacenter script).

Additionally, when communicating to people the details of a VM with a snapshot I include Custom Attributes and Tags. So the workflows below will contain some tagging actions that I really need to release 😉

Virtual Machine

This workflow will need to do the following:

  • Get the snapshot information for a VM.
  • Calculate snapshot size (if possible).
  • Get the tags for a VM.
  • Add information to the report.
  • Check the tag to see if the VM will have it’s snapshots automatically managed.

If the VM is to be automatically managed, it will need to do:

  • If the snapshot is older than the warning age, send a warning to the snapshot creator, or the errors address.
  • If the snapshot is older than the deletion age, send a notification to the snapshot creator, or the errors address.

I’ve separate the above into two lists as I think the logic flows naturally in this way – the first list will apply to every VM that we examine whereas the second will only apply to those VMs with snapshots that we will manage. Thus we can break the logic up into two workflows.

Disclaimer: I’m assuming some familiarity with Orchestrator here, and there are plenty of other awesome blogs introducing the tool, so I’m going to dive in to some completed workflows.

The schema for the first flow is below:

Some notable steps:

  • Excluded from snaps – This is where the tag is checked.
  • Automatic Management – This is where the WhatIf is handled.
  • Manage Snapshots – We call the workflow to manage them. This sits inside a loop that will examine all the snapshots on a VM.

Some scripts are included to help with generating the report 🙂

The schema for Managing Snapshots is here:

The notable steps for this workflow are:

  • getEmailAddressForUser – this will query Active Directory for the mail attribute for the user identified earlier.
  • Remove Snapshot – deletes the snapshot and sets a task object so the workflows/entire process operates synchronously.
  • Customise Email – these write the warning and notification emails to be sent to the end user.
Cluster Workflow

This workflow retrieves all the VMs in the cluster and runs the main VM workflow against it. Nothing special here 🙂

Clusters / vCenter Servers Workflow

This is fairly standard workflow that goes through all the clusters on a vCenter Server and runs the Cluster workflow. It follows a standard pattern of retrieving a set of items and iterating through them. The end of the schema is devoted to sending the report email.

Configuration Element

As I wrote this a while ago, I put all the configuration inside a Configuration Element so that configuration variables could be updated once a workflow was scheduled. Please excuse my naming format, it does the job but the names made sense at the time I promise !

These names match the inputs that the workflows take:

Emails

Here is an example of the report email running on a live environment:


An example of the email notification a user receives about their snapshot (some field values have been removed):

Download

The workflows, actions and configuration element is available to download here. The package includes the following workflows

Once the package is imported, run one of the workflows and let me know what you think ! You’ll be able to run the cluster and VM workflows without having to configure a Configuration Element.

Improvements

The most obvious improvement to this workflow would be to made the notifications optional and controlled via an input. Maybe something for a rainy day 😉

Let me know if this was useful or interesting to you !