Tag - vSphere
With Velero now installed on our cluster, we can create a backup job and protect our data. We can also test restores of our data into a new namespace. I mentioned in the previous post that we’re going to use File-System Backups (FSB) for the applications in our TKG cluster, and you’ll see how that’s handled here. Velero works using the idea of “jobs” - either a backup job or restore job.
Disclaimer: Most of these notes are based on my own internal wiki notes when I deployed Velero in my lab. They may be out of date between the date of publication and when you read it. What is Velero? Velero is VMware’s open-source backup and restore tool for workloads running in Kubernetes. It’s available to anyone for use on Kubernetes, and supports several combinations of integrations with underlying cloud platforms (AWS, Azure, vSphere, etc).
This one had me stumped for a few minutes. A while ago I was working with a customer that was trying to boot into the EFI shell of a VM. Restarting into the EFI boot menu, they found the EFI shell option was missing: Turns out, you need power off the machine and disable Secure Boot in the VM’s Boot Options: From this To this Booting the machine you’ll find the EFI Shell:
Recently I had a customer testing their vCenter Server 7.0U3e file-based restore process using the VCSA Restore Wizard. During Stage 2 (data copy) the wizard hung at 80% and did not progress for hours. No UI errors or hints as to what was happening. I had a look at the restore wizard logs and found the following: 2022-07-12T01:16:19.828Z - debug: pollRpmInstallProgress:getGuestFileErr:ServerFaultCode: The object 'vim.VirtualMachine:103' has already been deleted or has not been completely created 2022-07-12T01:16:30.
Over the years I’ve had many people ask me how I manage networking on my single host lab environment. “How do you do VLANs and manage routing/firewalling without a physical device providing that upstream?” they would ask. Well, I use a PFSense VM. Which would usually be followed up by “Yea but how?” VLAN trunking on my dvSwitch and sub-interfaces within PFSense. Easy stuff! Let’s review requirements first: Be able to define multiple VLANs and subnets.
I have a customer about to embark on a (relatively) large Cisco ACI upgrade, followed by a dvSwitch LACP update to Enhanced LACP. This includes updating ACI EPG policies to use the newly created LAG as the primary uplink port. Normally in the vSphere UI, you can view this per port group. But in an environment wth 12 vCenter Servers and 22 distributed switches (and a couple thousand port groups), that manual process can get out of hand pretty quick.
A few weeks ago, a customer of mine was attempting an embedded vCenter upgrade from 6.7U3 to 7.0. Stage 1’s deployment of a new vCenter appliance was successful, however Stage 2 (on the new appliance) was failing while attempting to perform a pre-check. We checked the requirements-upgrade-runner.log file and found an error, but it’s quite vague: lookup.fault.EntryNotFoundFault. We worked together and checked the following: SSO admin password contained only supported characters.
TL;DR - the vSAN HCL Database update from vCenter requires port 80 outbound to http://partnerweb.vmware.com/service/vsan/all.json. A little while ago, a customer of mine, in a restricted environment, was having issues getting the vSAN HCL Database updated in vCenter. Following https://kb.vmware.com/s/article/2109870 didn’t address the issue, but allowed them to update the DB manually. Outbound HTTPS traffic to partnerweb.vmware.com was whitelisted on the proxy, and 443 allowed on the firewall. Testing DNS and ports using curl showed that everything should be working, but we still couldn’t update the HCL DB.
I recently had the need to ‘prep’ a VM after converting it to vSphere. By ‘prep’ I mean (after you’ve installed VMware tools) do the usual grind of updating the virtual hardware to the latest supported by ESXi, update the vNIC to VMXNET3, and change the SCSI controllers to ParaVirtual. I thought about the times when I was in customer land and we would have to convert VMs from some other platform or in some cases, correct a VM that had been built incorrectly.
Note: A bit more testing on my end has found this script is only valuable if your VMDKs are on separate datastores. I am working to find a better metric to pull the data per VMDK. Background Have you ever heard of “Uncommitted Space” in vSphere? It’s one of those things we all seem to ‘know’ without really knowing. It’s a pretty standard metric most commonly found against vSphere Datastores. It’s effectively calculated based on the provisioned and used storage of a datastore and its contents.
- Restore of a Wordpress deployment is stuck redirecting the port
- Backups and Restores using Velero in TKGm 1.6.1
- Unable to upgrade the database: org.postgresql.util.PSQLException: ERROR: could not open shared memory segment: No such file or directory
- Upgrading Cloud Director 10.4.1 to 10.5
- Installing and Configuring Velero in TKGm 1.6.1 on vSphere