The other day I had both my VMware 5.5 U2 host servers that were part of a VMware HA cluster –CRASH- with a PSOD at nearly the same exact time. Luckily it happened on a weekend and the outage wasn’t even noticed by users. My first steps in figuring out what happened was to open a technical support case with VMware and have their support engineer review and evaluate the core VMware core dump file.
Unfortunately, during my conversation with the technical support engineer at VMware, I was told that the core dump files were not saved to disk (on either host server) and were lost during the subsequent rebooting of the servers post-crash. Obviously, without these files available, it became rather difficult for the support engineer at VMware to accurately determine root cause of the crash. Instead, we were left with several possible causes based only on our assumptions and hypothetical guesses. I was rather perplexed as to what happened with the core dump files so I decided to dig into WHY these files weren’t saved during the crash.
VMware core dump locations are different between versions of ESXi
The most important thing to keep in mind here is that the core dump file/partition locations and sizes are NOT identical between different versions of VMware ESXi. After digging around, this is what I found:
VMware 5.5 and newer core dump location: 2.5GB in size.
VMware 5.0 and earlier core dump location: 100MB in size.
This Wisconsin manufacturer needed to modernize its IT infrastructure to support rapid business growth.
Discover what they didWhy is this important? It seems like the core dump location and configured size is –NOT UPDATED– during upgrades from pre-5.5 version to 5.5. In other words, if you upgrade a 5.0 host to 5.5, the VMware core dump location and partition size STAYS at 100MB. As a result, it’s very likely that your 5.5 host servers will NOT be able to create a core dump file for diagnostic purposes during a PSOD.
To determine the core dump location, open an SSH session to the host server and type: esxcli system coredump partition get
Then we will check to see what physical drive that actually is on the server, type: esxcli storage core path list
You can see here that the drive where the VMware core dump partition lies is on the internal USB/SD Card (in this case, it’s the same place the ESXi operating system is installed)
Next, we will check the size of the partition, type: ls -lh /dev/disks
Remember that earlier when we did an “esxcli system coredump partition get” command, the results came back showing mpx.vmhba32:C0:T0:L0:7. According to the screenshot above, you can see that the partition size is only 110MB in size. Since we’re running ESXi 5.5 U2, that’s likely not going to be adequate for saving the core dump file during a PSOD scenario!
In comparison, here is a screenshot from a VMware 6.0 server with a fresh install:
On this particular VMware 6.0 server shown above, mpx.vmhba32:C0:T0:L0:9, is the VMware core dump partition and you can see here that it’s 2.5GB in size, plenty big for saving crash information during PSOD situations.
Moving ESXi core dump File to a file instead of a partition
For reference, you can find the VMWare article here: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2077516 but this is relatively easy to do. Before doing so however, you may want to verify that someone else didn’t already set the core dump to go to file. You can do that by typing: esxcli system coredump file list and verify the list is empty. If it’s not, someone may have already moved the core dump from the default partition to a file.
Assuming the esxcli system coredump file list command came back with no entries, we need to add a core dump file. We can do that by typing: esxcli system coredump file add –d datastore1 –f servername
Then run this command to set the dump file for the host: esxcli system coredump file set -p /vmfs/volumes/DATASTORE_UUID/vmkdump/servername.dumpfile
Then finally, verify the dump file is active and configured with this command: esxcli system coredump file list
The screenshot shows that your coredump is active and configured!
Going forward, any PSOD crashes on your upgraded VMware 5.5 host server should provide VMware core dump files that you can then send off to VMware technical support for further evaluation.