• Skip to primary navigation
  • Skip to main content

jdgreen.io

The personal blog of James Green

  • Home
  • Technology
    • Announcements
    • Industry
    • Reviews/Guides
    • Tech Field Day
    • Interviews
  • Health
    • Food
    • Biohacking
    • Fitness
  • Faith
  • Book Notes
    • Business Books
    • Faith Books
    • Health Books
    • Technology Books
  • About
  • Show Search
Hide Search

VMware

PowerCLI – Configure Syslog for All Hosts

James Green · Aug 10, 2015 ·

Here’s a quick bit of PowerCLI to configure syslog server on all hosts, place each hosts logs in a unique directory, and then enable the firewall exception to allow it to be used. This will do every host in the vCenter you’re connected to. I highly recommend that anyone managing a vSphere environment set up a syslog destination for ESXi. There’s nothing more frustrating than attempting root-cause analysis on a failure when logs aren’t persistent and in a central location.

Cheers!

[code language=”ps”] #Get all ESXi hosts
$hosts = Get-VMHost

#Update Syslog configuration
$hosts | Set-VMHostAdvancedConfiguration Syslog.global.logHost -Value 0.0.0.0
$hosts | Set-VMHostAdvancedConfiguration Syslog.global.logDirUnique -Value True

#Enable firewall exception
$hosts | Get-VMHostFirewallException | where {$_.Name.StartsWith(‘syslog’)} |
Set-VMHostFirewallException -Enabled $true
[/code]

Correct Misaligned VMDK

James Green · Aug 6, 2015 ·

This post details the process for aligning VMDK files properly. There’s a great post from Duncan Epping on why you need to do this here. In these notes, we’re dealing with NetApp storage. Duncan’s post details some other tools for other storage vendors (like UberAlign from Nick Weaver).

Before beginning this process, ensure that:
– VM with misaligned VMDK is powered off
– VM has no snapshots
[Read more…] about Correct Misaligned VMDK

LAG vs LBT for vSwitch Uplinks

James Green · Apr 16, 2015 ·

When it comes to distributing load across a set of physical uplinks, which NIC teaming method reigns supreme?

Definitions

First, let me define the terms. Not being a networking guy, this was initially confusing to me because they all seem so similar. My biggest challenge when having these conversations is when the other party uses terms to mean something other than what I think they mean.

  • LAG: Link Aggregation Group. This is a generic term used to refer to any sort of bonding of links to achieve greater throughput potential. This would include both static EtherChannel and Link Aggregation Control Protocol (LACP).
  • EtherChannel: The Cisco Systems Inc. proprietary link aggregation scheme, which accomplishes more or less the same goal as the IEEE standard 802.1AX-2014. It groups up to eight physical links into a single logical link. This provides increased throughput potential while avoiding loops. This aggregation can be done statically by configuring both sides of the links in play, or it can be set up dynamically by either LACP or Port Aggregation Protocol (PAgP), which is the proprietary method of Cisco to accomplish the same thing as LACP.
  • LACP: Link Aggregation Control Protocol. This is also defined in the 802.1AX standard, and provides a method for automating LAG configurations. LACP-capable devices discover each other by sending LACP packets to the Slow_Protocols_Multicast address 01-80-c2-00-00-02. They then negotiate the forming (or not forming, perhaps) of the LAG. Dynamic configuration is often desirable because it helps avoid configuration issues.
  • LBT: Load-Based Teaming. In this context, LBT refers specifically to the VMware Inc. implementation of the Load-Based Teaming load-balancing policy, available only on a vSphere Distributed Switch (VDS) — virtual standard switches (vSS) get no love. This policy is also known as “Route based on physical NIC load.” It’s important to note that no bonding of physical uplinks is done as far as the upstream switch is concerned.

The Confusion

With that out of the way, it’s time to dig into where folks get confused. There’s an assumption (partially correct) that an LAG balances utilization across all links in the group. If a vSwitch has four physical uplinks in a Port Channel (the single logical entity created by the EtherChannel protocol) to the upstream switch, the understanding is that traffic is evenly balanced between all uplinks. In my experience, most people have the same understanding of LBT: If four uplinks exist on the vSwitch, traffic will be evenly distributed across all four links.

Actually, both of these understandings are false. Neither load-balancing method “evenly distributes” traffic to achieve uniform amount of utilization across all links. That’s not necessarily a bad thing; it’s just important to understand what’s actually happening, as it does impact design decisions. I’ll take a look at both.

LAG Does Not = Load Balancing

I’m saying LAG because it’s trendy, but I’m going to be specifically discussing EtherChannel.  As many have mentioned, “load balancing” is really the wrong phrase to describe an LAG because it’s inaccurate. Load is not actually “balanced;” it’s distributed. There’s a distinct difference.

To balance load would mean to dynamically select uplinks based on the current utilization of all links in the group, in an attempt to utilize the same amount of each one. For example, if 20 percent of each of the four links is utilized, I’d call that balanced.

Load distribution, on the other hand, means to algorithmically assign sessions to a given uplink (based on a hash value that the algorithm has calculated). It’s then the algorithm’s job to distribute the sessions as evenly as possible.

The list of hashing possibilities is long. You can choose to hash based on MAC, IP or port; and Source, Destination or both. A given combination of those “inputs” will be chewed up by the algorithm and spit out to be used in selecting a link. This output is called the Result Bundle Hash, or RBH. The RBH value, which is 0 through 7, will be used to select the link. Each link is responsible for a given number of the possible RBH values, which depends on how many uplinks are in the group.

From Cisco.com - http://www.cisco.com/c/en/us/support/docs/lan-switching/etherchannel/12023-4.html
From Cisco.com – http://www.cisco.com/c/en/us/support/docs/lan-switching/etherchannel/12023-4.html

Note that with any number of uplinks not a factor of 2, distribution becomes imbalanced. Because the hashing algorithm can’t evenly divide the 8 possible values, it must assign extras to the first few links. Because of this, it’s recommended to not use Port Channels with a number of links other than 2, 4 or 8 if you care about distribution.

With an understanding of how link selection is performed, I’ll walk through an example. For this example, the hashing mechanism is the source and destination IP address, or src-dst-ip. This means that a session from 10.0.0.2 to 10.0.1.3 might compute to an RBH of 0x4, and another session from 10.0.0.2 to 10.0.1.4 might compute to 0x1.

Link selection is based on computed RBH value
Link selection is based on computed RBH value

As you can see, even though the algorithm may be good at evenly distributing sessions based on hash (two possible values to each link in the example), it’s completely unaware of the actual utilization of the uplink.

Sessions are distributed solely based on the correlation of their RBH and the uplink responsible for that RBH. Therefore, it’s feasible (however unlikely it may be) that due to widely varying traffic needs from one VM to another and the luck of the draw with the algorithm, one link could be 100 percent utilized and dropping packets, while the other three links sit idle. My networking buddies tell me that if you look at real-world distribution across the links in a port channel, it’s usually reasonable. So I’m not trying to say that this worst-case scenario should be expected; it just needs to be considered during design, because it’s a possibility.

There are two key takeaways from this:
1. A Port Channel does not evenly balance utilization across the links in a group; it evenly distributes sessions based on the specified hashing policy.
2. A single session will still never be given more bandwidth than a single uplink, due to the way traffic is distributed.

Load-Based Teaming

Now, on to the second misunderstanding: LBT. It’s as common as the first, and rightly so. If I didn’t know better, I’d expect this mechanism to work just like any unsuspecting administrator thinks it does. The assumption is that because LBT is aware of the physical NIC load, it evenly distributes sessions across all available uplinks. As in the earlier example, you’d assume that all uplinks would be balanced at the same percentage of utilization. Sadly, again this is not the case.

The truth about LBT is that it select uplinks the same way as Route Based on Originating Virtual Port ID initially. When a VM boots, the vNICs are assigned to a dvPort. That port is used to determine which uplink the traffic will use. The LBT mechanism comes into play every 30 seconds, when it polls the uplinks. If an uplink is more than 75 percent utilized during that polling period, LBT will move that dvPort to a less utilized uplink.

LBT-in-action

LBT takeaways:
1. LBT does have awareness of the link utilization and ensures that no link is utilized more than 75 percent before all the others are, as well.
2. LBT does not evenly balance traffic across uplinks when saturation is not occuring. This may explain the confusion for some folks looking at ESXTOP metrics. It will only move an assignment to another uplink once saturation occurs.

Recommendations

If you only have a vSphere Standard Switch to work with due to environment constraints or a lack of Enterprise Plus licensing, you’re stuck with Port Channel and “Route based on IP hash.” And, frankly, unless you really need to squeeze out some performance, I’d avoid the Port Channel all together and stick with the vanilla “Route based on originating Port ID.” It’s reliable, requires no configuration outside of the vSwitch, and performs well.

If, however, you’re blessed to be in possession of a functioning VDS, then in almost all cases I’d recommend using LBT. The awareness of actual utilization is comforting, and although I don’t feel comfortable saying that it “evenly distributes load,” I do like that it distributes load in such as way as to avoid contention until all links have been utilized. It’s just a bonus that it also doesn’t require any configuration of the physical switches.

This post was originally published at http://virtualizationreview.com/articles/2015/03/26/load-balancing-vsphere-vswitch-uplinks.aspx. I’ve re-posted it here to share with my readers.

My VMworld 2014 Tech Talk

James Green · Feb 9, 2015 ·

At VMworld, the vBrownBag crew puts on a Tech Talks series. Throughout the week, VMworld attendees give 10 minute “lightening talks” on a given topic. I guess I must have forgotten to share this when I got home. Below is my vBrownBag Tech Talk from VMworld 2014 on community tools to help a VMware administrator do their job more efficiently. Cheers!

Sockets vs. Cores on a VMware VM Config?

James Green · Jan 26, 2015 ·

I get this question regularly, and after having just typed up another long answer to the question, I’m going to share the answer here for future reference 🙂

The question usually goes something like this: “What is the performance impact of allocating 1 socket and 4 cores when I build a VM versus allocating 4 sockets with 1 core each?” This person is using the old vSphere Client, and has seen this option which was added for a great reason, but assuming something false about its purposes.

First and foremost: on a small (less than 8 vCPU) VM, there is no performance impact of setting it one way or another. Because the hypervisor schedules the resources on the back end, it really doesn’t matter what configuration they’re presented to the guest OS in, at least not from a performance perspective. So stop worrying about it 🙂 With that being said, why does this option exist?

The reason you used to have the option to select sockets vs. cores was to get around a limitation in the guest OS. For example, Server 2008 will only use up to 4 physical CPUs. By increasing the number of cores per socket, you can raise the number of CPUs the guest OS will allow you to use (since it seems them as cores). The one consideration to keep in mind is that if there isn’t a reason to trick the guest OS using cores, scale VMs using the socket setting. The number of virtual sockets can/will affect vNUMA calculations.

Update: one clarification in light of Jim’s comment below. Be aware that while using cores per socket to fool the guest OS for licensing purposes, you DO run the risk of swaying the vNUMA calculations in a way that can negatively impact performance. In a small VM, it will be contained within a single NUMA node and you’re probably safe. But when a larger VM spans multiple NUMA nodes, you are at risk of adversely swaying the calculations by increasing the core count. Please read the comment for more depth!

You can read more about this topic on page 267 of vSphere Design by Scott Lowe and Forbes Guthrie, as well as in this blog post by Frank Denneman.

Also, major trolling on Twitter after this was posted 🙂

@jdgreen @wyrdgirl @millardjk @julian_wood I’m a 4×24 man myself to make troubleshooting that much harder.

— Christopher Kusek (@cxi) January 26, 2015

  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Interim pages omitted …
  • Go to page 14
  • Go to Next Page »

Copyright © 2023 · Monochrome Pro on Genesis Framework · WordPress · Log in

Posting....