Unable to push CA certificates and CRLs to host

Note: This post addresses and (hopefully) fixes the cause of the issue found here: vVols Endpoint - Failed to establish connection on ESXi host

Recently, one of my customers was trying to refresh the CA store on newly built ESXi 6.7 U3 hosts under a freshly upgraded vCenter Server 6.7 U3 instance. When the admin tried refresh the CA store, they were getting this error message in the vSphere Client:

A general system error occurred: Unable to push CA certificates and CRLs to host <<esxi-hostname>>

We found this odd, and were a little concerned that maybe this recently upgraded vCenter Server was giving us grief. A quick look over the vCenter certificate store (VECS) confirmed nothing had expired or appeared to be outright faulty. Why is this important? The VECS is the certificate store that vCenter references not only for CA's and certificates it trusts, but CA's the ESXi hosts are told to trust too.

First things first, let's check the ESXi logs (you do check the logs first, right?). Enable the SSH service on the host and SSH to it. The log we want to look at is /var/log/hostd.log. Run cat /var/log/hostd.log | grep -i "Certificate is not a valid CA certificate" -A 20. It took us a while to dig out, but this command will show the certificate printed in the log in PEM format. It's this certificate that ESXi isn't accepting.

If you want to identify the certificate in question, try running the command below. Forgive my terrible awk skills, but this did the trick for us:

cat /var/log/hostd.log | grep -i "Certificate is not a valid CA certificate" -A 30 | awk '/-BEGIN CERTIFICATE-----/{a=1} /-----END CERTIFICATE-----/{print;a=0}a' | awk '{sub(/--> /,""); print}' | sed '/-BEGIN CERTIFICATE-----/s/^/----/' | openssl x509 -text -noout

That will grab the certificate in the log, remove the ‘–>’ you see, clean it up a bit and send it to openssl to print out for you. Note: It will grab and process the first match in the log. Have a look for the thumbprint and take note. We'll need this next.

Cool, we've got the troublesome certificate's details. Let's find it in VECS on the vCenter Server. You can do this quite easily in the vSphere Client if you go to Administration > Certificate Management. Find the trusted certificates heading and look through the certificates listed there. Match the thumbprint from the output above to the thumbprint of the certificates you're seeing in the VECS. Once you find it, record the alias of the certificate.

What certificate did you find? For my customer, it was the ssoserver certificate, a remnant from an older version of vCenter Server. You don't need this anymore if you're running vCenter Server 6.7 (I haven't confirmed this). If it's not the ssoserver certificate, I can't suggest you go ahead and delete the problem certificate unless, of course, you know what you're doing. If I were you I'd probably log a VMware SR and get it verified. Otherwise, feel free to continue.

Now that you have the alias, you can SSH to the vCenter appliance to remove the offending certificate from the VECS. I think you can remove certificates from the vSphere UI too, but this is all from memory and my lab is down at the time of writing. There's a straight forward command to remove a certificate from VECS:

/usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store TRUSTED_ROOTS --alias <problem certificate alias here>

That should do the trick though! Like always though, take a backup and a snapshot of your vCenter before you go making changes like this. You can seriously break stuff.