troubleshooting

After a routine update of a 6-node Nutanix cluster, a Nutanix Cluster Check (NCC) warning popped up indicating a problem with the SAS cabling. Running the check on the CLI offered some more details. Running : health_checks hardware_checks disk_checks hpe_hba_cabling_check [==================================================] 100% /health_checks/hardware_checks/disk_checks/hpe_hba_cabling_check [ WARN ] -----------------------------------------------------------------------------------------------------------------------------------------------------------+ Detailed information for hpe_hba_cabling_check: Node 10.99.1.205: WARN: Disk cabling for disk(s) S6GLNG0T610113 are detected at incorrect location(s) 3:251:8 respectively where each value in the location corresponds to box:bay Node 10.

Usually I tend to use the iPhone WiFi hotspot feature. But lately, I had to switch to USB tethering, because I had to work a whole workday using the hotspot feature. USB tethering saves battery and the connection was more reliable for me. Please note, that you need to install iTunes to use USB tethering, because the necessary Ethernet driver is only available with iTunes. Without this driver, Windows won’t recorgnize the iPhone as an Ethernet connection.

While chilling on my couch, I stumbled over this pretty interesting Reddit thread: Story Time - How I blew up my company’s AD for 24 hours and fixed it : sysadmin (reddit.com) Long story short: A poor guy applied some STIG hardening and his Active Directory blew up. Root cause was disabling RC4, which caused Kerberos failures, primarily documented by errors like “The encryption type requested is not supported by the KDC.

This error gets me from time to time, regardless which server vendor, mostly on hosts that were upgraded a couple of times. In this case it was a ESXi host currently running a pretty old build of ESXi 6.7 U3 and my job was the upgrade to 7.0 Update 3c. If you add a upgrade baseline to the cluster or host, and you try to remediate the host, the task fails with a dependency error.

Today I faced an interesting problem. A customer told me that their Exchange 2010, which is currently part of a Exchange cross-forest migration project, has an issue with Outlook Web Access and the Exchange Control Panel. Both web sites fail with a white screen and a single message: 440 Login Timeout I checked some basics, like certificate, configuration of the virtual directories and I found nothing suspicious. Most hints on the internet pointed towards problems with the IUSR_servername user, which is not used with IIS 7 and later.

EDIT: It seems that his was fixed in vCenter 7.0 U3. While debugging a vCener Lifecycle Manager, which was unable to download updates, I’ve stumbled over a weird behaviour, which is (IMHO) by design. Some of you might use a proxy server. And some of you might use a proxy server which requires credentials. In my case, my customer uses a Sophos SG appliance as a web proxy server with authentication.

As part of an ongoing Exchange 2010 to 2016 migration, I had to replace the self-signed certificate with a certificate from the customers PKI. Everything went fine, the customer had a suitable template, we’ve added the necessary hostnames and bound IIS and SMTP to the certificate. The mess started with an iisreset /noforce… The iisreset took longer than expected. After that, I tried to login into the ECP, entered username and password and got an error.

During an vSphere 6.5 > 6.7 update a was host failing continously at the remediation with an “unknown error”. The host was updated from ESXI 6.5 to 6.7 using an upgrade baseline. Other hosts were updated to 6.7 and with the latest patches without any issues. Something strange was going on… The esxupdate.log and the vua.log on the host itself showed nothing special. So I checked the vmware-vum-server-log4cpp.log which was much more informative!

After a reboot, a VMware ESXi 6.7 U3 told me that he has no compatible NICs. Fun fact: Right before the reboot everything was fine. The ILO also showed no NICs. Unfortunately, I wasn’t onsite to pull the blade server and put it back in. But there is a way to do this “virtually”. You have to connect to the IP address of the Onboard Administrator via SSH. Then issue the reset server command with the bay of the server you want to reset and an argument.

Actually, yesterday should be the day at which I migrate one of the last physical Windows vCenter servers installed in my customer base. Actually… the migration failed twice. And each time I had to rollback, power-on the old physical server, reset the computer account etc. The update was from VMware vCenter Server 6.0 Update 3d (7462484) on a Windows 2012 R2 server to vCenter Server 6.7 Update 3 (Appliance). The migration failed at 62% with the following message:

hpe_hba_cabling_check falsely issues a warning

Failed to connect to IKEv2 VPN using iPhone USB tethering

Why you should change your KRBTGT password prior disabling RC4

Upgrade to ESXi 7.0: Missing dependencies VIBs Error

Outlook Web Access fails with "440 Login Timeout"

Escaping special characters in proxy auth passwords in vCenter

Exchange Control Panel /ecp broken after certificate replacement

Update Manager fails with unknown error during host remediation

Virtually reseated: Reset blade in a HPE C7000 enclosure

vCenter Migration from 6.0 to 6.7 fails due to missing user role