Wrong iovDisableIR setting on ProLiant Gen8 might cause a PSOD

TL;DR: There’s a script at the bottom of the page that fixes the issue.

Some days ago, this HPE customer advisory caught my attention:

Advisory: (Revision) VMware – HPE ProLiant Gen8 Servers running VMware ESXi 5.5 Patch 10, VMware ESXi 6.0 Patch 4, Or VMware ESXi 6.5 May Experience Purple Screen Of Death (PSOD): LINT1 Motherboard Interrupt

And there is also a corrosponding VMware KB article:

ESXi host fails with intermittent NMI PSOD on HP ProLiant Gen8 servers

It isn’t clear WHY this setting was changed, but in VMware ESXi 5.5 patch 10, 6.0  patch 4, 6.0 U3 and, 6.5 the Intel IOMMU’s interrupt remapper functionality was disabled. So if you are running these ESXi versions on a HPE ProLiant Gen8, you might want to check if you are affected.

To make it clear again, only HPE ProLiant Gen8 models are affected. No newer (Gen9) or older (G6, G7) models.

Currently there is no resolution, only a workaround. The iovDisableIR setting must set to FALSE. If it’s set to TRUE, the Intel IOMMU’s interrupt remapper functionality is disabled.

To check this setting, you have to SSH to each host, and use esxcli  to check the current setting:

I have written a small PowerCLI script that uses the Get-EsxCli cmdlet to check all hosts in a cluster. The script only checks the setting, it doesn’t change the iovDisableIR setting.

Here’s another script, that analyzes and fixes the issue.

Wrong iovDisableIR setting on ProLiant Gen8 might cause a PSOD
5 (100%) 12 votes
Patrick Terlisten
Follow me

Patrick Terlisten

vcloudnine.de is the personal blog of Patrick Terlisten. Patrick has nearly 2 decades of experience in IT, especially in the areas infrastructure, cloud, automation and industrialization. Patrick was selected as VMware vExpert (2014 - 2017), as well as PernixData PernixPro.

Feel free to follow him on Twitter and/ or leave a comment.
Patrick Terlisten
Follow me

Latest posts by Patrick Terlisten (see all)

5 thoughts on “Wrong iovDisableIR setting on ProLiant Gen8 might cause a PSOD

  1. Patrick Long

    I worked extensively with HPE and VMware in late November and early December 2016 to identify and troubleshoot this issue – my testing showed that for us, this issue only appears for Gen8 servers with Intel Ivy Bridge procs(v2), and only sporadically when they were under significant load; our Gen8 servers with Intel Sandy Bridge (v0) procs were not affected but as always YMMV – I would (and did) follow the KB recommendation is to revert the setting to FALSE for ALL Gen8 servers. I would like to clarify that the change in the default setting to TRUE for iovDisableIR was in fact made PRIOR to 6.0 U3 as you indicated, it actually was changed in ESXi 6.0 Patch 4 build-4600944 released 2016-11-22. The HPE Advisory is correct in this regard, but the VMware KB also incorrectly states the change occurred for ESXi 6.0 in Update 3 and I have submitted a request to have it corrected. VMware advised me that the reason for this reversal of the default setting of iovDisableIR was that the prior default of FALSE “was causing issues with other vendors systems” although I could not get anything more specific out of them than that – they would not identify to me which vendors were affected or what issues were caused.

    I would think that this issue could be fixed with a Gen8 BIOS revision, as the latest available for my affected DL380p servers was released 7/1/2015, but I was not given any information by HPE to indicate that a new release was forthcoming or even being worked on so for now please follow the Advisory/KB recommendation.

    Reply
  2. Juan Fernandez

    Hi Patrick. Very useful information.

    Maybe a completely fool question. Gen9 servers are completely out of scope for the KB? In your opinion, can be considered in any way a best practice changing iovDisableIR to FALSE in Gen9 servers?

    Reply
  3. Its Broken

    https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2147325

    Says basically to not set the setting to false?

    Resolution

    VMWare recommends contacting the hardware manufacturer for updated BIOS or possible workarounds.

    Note: A prior version of this KB article recommended that customers experiencing the problem described above work around it by configuring ESXi to disable the Intel® VT-d interrupt remapper (setting boot option iovDisableIR=FALSE and rebooting). VMware ESXi 5.5 p10, 6.0 p04, 6.0 U3 and 6.5 by default disable the Intel® VT-d interrupt remapper for this purpose.

    VMware has recently received several reports indicating that disabling the Intel® VT-d interrupt remapper is causing ESXi host failure on HPE

    Gen8 platforms, see ESXi host fails with intermittent NMI purple diagnostic screen on HP ProLiant Gen8 servers (2149043). VMware is no longer recommending that the Intel® VT-d interrupt remapper be disabled to work around the Intel® VT-d erratum described in this article. VMware is recommending that the fix for the erratum be applied in the BIOS as described in the Intel® specification updates for the affected processors.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *