Travelling Techie

Adventures in VMware

User Tools

Site Tools


Sidebar

Tags

Recent Changes

About The Author

Brandon Neill is a VMware Certified Instructor and Consultant. He specializes in NSX and vRealize Automation. In addition to teaching Official VMware Classes, he provides contract training and consulting services.

blog:troubleshooting_networking_sometimes_you_have_to_think_outside_the_network

Troubleshooting networking, sometimes you have to think outside the network.

Josh Townsend’s recent post on vmtoday, PCoIP Packet Loss? Don’t blame the network is a fantastic example of troubleshooting. In addition, it illustrates something I ran in to frequently as an Escalation Engineer. Namely, that just because a problem is exhibiting the signs of a network issue, sometimes you have to expand your search. Once you have eliminated the possible network problems, you have to be willing to look at other areas that might cause similar symptoms.

A very good example of that is packet loss. In addition to network problems (and the very occasional insidious vmkernel problem), anything that causes a vm to pause, even momentarily, can cause it to drop packets.

Storage is one possibility that can cause this, as Josh pointed out. Another possibility is too many vCPUs. While not exactly a pause, if the VM can’t get all it’s processors scheduled, it’s not going to be able to pull all of the packets off of the ring buffer, and they get dropped. Relaxed coscheduling helps with this, but does not eliminate it. I can’t count the number of cases that were resolved by reducing the number of vCPU’s from 4 to 2. (Be aware of HAL/kernel compatibility when changing between 1 and 2 vCPUs). Another issue that can cause packet loss/VM pausing is the CDROM drive. If the iso that is mounted is not available (there can be multiple reasons for this) but the operating system keeps trying to access it, this will lead to small frequent pauses, often resulting in every other ping being dropped. A common cause of this on ESX(i) 5.0 and below is mounting an ISO on a VMFS datastore, and then vMotioning it to the 9th host to access that image. VMFS only supports 8 hosts accessing the same read only file at one time (This has been increased to 32 with VMFS5 on 5.5 and above).

Related information:

blog/troubleshooting_networking_sometimes_you_have_to_think_outside_the_network.txt · Last modified: 2016/12/31 19:02 by brandon