I have a question. Our company is only about 1 year old in deploying UCS B-Series solutions, but have deployed several thus far. We are currently in a discussion on how to properly design and use Multi-NIC vMotion. The discussion is whether we need to have two separate vlans to properly do Multi-NIC vMotion. i do know that Multi-NIC vMotion on the VMware side requires that each NIC binding be done so via IP Binding (http://www.yellow-bricks.com/2011/09/17/multiple-nic-vmotion-in-vsphere-5/). The question arose as to if two subnets are needed though based on one of our Storage/VMware team members finding this article where a gentleman used two separate vlans for multi-nic vmotion
My Opinion is that while the two separate Vlans still works it is not needed. I believe the goal was to try pin down A-Side and B-Side of the fabric interconnect. My assumption is that the gentlemen was thinking that he needed two subnets to accomplish this. In my understanding of UCS is that we use the vNIC templates to pin traffic down A-Side and B-Side which would mean that only one vlan is still needed and multi-nic vmotion can still be completely accomplished using the first method in the first link with VMware vCenter using both paths of A-side and B-side simultaneously.
i hope that this wasn't to confusing. All and Any input and discussion is much appreciated.
You would use the same vlan and yes, pin the vnics down to separate FI's. In the embedded video on Duncan Epping's tutorial he uses vlan10 for both kernel ports.
You would create vnic templates for all of your vmware vnics similar to this (depending on if you wanted to follow a traditional rack server layout for your vswitch (6 to 8 vnics) or you could consolidate your switches if you're using 10Gb, etc.):
Duncan mentions using vmware's Network IO but that is a software solution. I would recommend using what you have. What I mean by that is that many people do not know that the UCS has QoS capability built in to the hardware and it is both Egress and Ingress. You can create QoS policies to then layer on top of your vnic templates which then get populated over in to your Templates and you create your service profiles from those. Yes, a little complicated to setup at first but then you save off your configuration and deploy from your templates. These QoS policies only kick-in if there is actual contention of the bandwidth but they do show up as that speed in vmware, which I think is VERY cool, i.e. 1Gb shows up as a 1Gb vnic in vmware. (see my link near further below)
Brad Hedlund does a great job of explaining QoS in the UCS Virtual Interface Card (VIC):
I need to do an updated version but here is one I've done in the past that is still relevant:
You can see that I pinned the "A" side of the FI to all of the VICs for "A" and the "B" side of the FI to all of the VICs that need to go out that FI. So that way we keep a true "A" and "B" fabric. You will want to do that with your vmotion vnic.
Make sure you have one vnic going out of your FIA and one vnic going out your FIB.
- create separate vmkernel port groups for each vnic
- use the same vlan
The benefit of using 6 to 8 vnics is that you can apply QoS down at a vnic level. So you could do something similar to this:
vmnic0 - consoleA - FIA, 1Gb, Platinum (does not drop packets)
vmnic1 - consoleB - FIB, 1Gb, Platinum
vmnic2 - vmotionA - FIA, 3Gb, silver
vmnic3 - vmotionB - FIB, 3Gb, silver
vmnic4 - ProdutionA - FIA, 10Gb, Gold
vmnic5 - ProductionB - FIB, 10Gb, Gold
vmnic6 - BackupA - FIA, 3Gb, Bronze
vmnic7 - BackupB - FIB, 3Gb, Bronze
And if you do any bare metal servers, i.e. need to load an application that demands bare metal, no virtualization, you'll want to create some vnics for that and use the Fabric Failover feature built in to the FI's. Also a cluster vnic.
I usually set my cluster vnic to platinum and 1Gb. You don't need a 10Gb cluster and you definitely don't want to lose connectivity or your cluster will fail over, etc.
You could go higher on those vnic bandwidth settings as mine were based on the original Palo VIC. With the newer VIC1240 and 1280 (20Gb) you could probably double those numbers above.
I hope this helps.
I believe you would want two VLANs / subnets, one for each fabric.
Say ESX01 has vmknic IP's A1 and B1 and ESX02 has vmknic IP's A2 and B2.
vCenter has no concept of fabric, so when all four vmknic IP's are on the same subnet, ESX01 can establish vMotion TCP connections from either source vmknic to either destination vmknic. E.g., A1-to-A2, A1-to-B2, B1-to-A2, and B1-to-B2 are all valid, meaning that a lot of vMotion will cross fabrics and put a load on the upstream network, adding latency, and potentially breaking UCS QoS. [Edit: this is the only way multi-NIC vMotion works on UCS]
But, when A1/A2 and B1/B2 are on different VLANs/subnets, then ESX01 will only establish A1-to-A2 and B1-to-B2 connections. That keeps vMotion traffic local to each fabric, minimizes latency, and ensures end-to-end bidirectional UCS QoS. [Edit: ESX up to 6.5 does not work as described in the previous sentence. A1/A2/B1/B2 must be on the same subnet as described in the previous paragraph.]
The same holds true for iSCSI -- you could put both iSCSI NICs on the same subnet, but that doesn't give you the traffic flow and multipathing behavior most are hoping for. It's much better to have two iSCSI subnets. [Edit: This still holds true -- you can and should have separate iSCSI-A and iSCSI-B VLANs]
My NIC configuration builder tool (http://beeline.org/ucs) supports both approaches. If you choose a single VLAN, it builds a single vSwitch with two vmk portgroups. If you choose two VLANs, it builds two vSwitches with one vmk portgroup apiece.
Would this be a viable alternative?
establish 1 vmkernel interface for vmotion
present 2 vnics (a-fab/b-fab) to the host
assign these as uplinks for vmotion on the vswitch/vDS
utilize active/passive NIC teaming at the host
rinse repeat for all other hosts and ensure that there is uniformity for the active uplink
implement system QoS policy on the FIs to prevent vmotion from overwhelming the links from the IOMs to server ports on the FIs
This would force vmotion traffic to utilize only one side of the fabric (the active vmnic), alleviating any traffic burden on the upstream LAN. If a failure in the active fabric connection were to occur, vmware would fail over to the standby vmnic. If there are multiple UCS pods spanning the same datacenter in vmware that utilize the same vds the cos tagging set at the FIs could be applied to the upstream LAN.
The only issue that could be problematic is if, for some reason the active vmotion vmnic were to not realize that the upstream LAN was down. The only thing that comes to mind is if the network control policy for "action on uplink" were set to warn.
That would work, as would using hardware fabric failover. Both approaches localize vMotion in a healthy state, but allow vMotion across the upstream network when there's an outage affecting a blade's NIC, chassis, or FI, or a misconfigured NIC team. It's still better than nothing, and is an imperative when there's a weak upstream network (e.g., 1G uplinks).
I was hoping that multi-vNIC vMotion would allow the vMotion VLANs to stay local to the FI but, alas, vMotion fails miserably if all vMotion NICs aren't on the same VLAN / subnet. I've corrected the above comment to reflect that.
In the end, there's no way to absolutely localize vMotion traffic without sacrificing NIC redundancy. This means that QoS controls on the upstream network is very important. And it also means that with multi-NIC vMotion, half your vMotion traffic will be crossing fabrics so you better have robust uplinks.