
It’s no secret that I’m a fan of using file level storage for virtual machines; block storage makes less and less sense for virtualized environments, especially when I see all the trickery that must occur for a block device to be VM-aware. For vSphere, the desire to use file based storage meant selecting one protocol: NFS version 3. I was always a bit sad that version 4 or 4.1 was not added to the hypervisor because of all the great improvements that were baked in.
In a move that I still find to be a bit shocking, vSphere 6.0 has added support for NFS version 4.1. This includes a great many new features and improvements:
- Authentication with Kerberos
- In-band, mandatory, and stateful server side locking
- Session trunking (true NFS multipathing)
- Greater error recovery
Be aware that in the release, datastores that are mounted using NFS version 4.1 cannot be combined with SDRS, SIOC, SRM, or Virtual Volumes. Additionally, NFS 4.1 datastores are not yet compatible with VAAI-NAS (hardware acceleration) as per the storage guide:
[NFS 4.1] Does not support Hardware Acceleration. As a result, virtual disks created on NFS 4.1 datastores are always thin-provisioned.
This is a deal breaker – this needs to be fixed before I can see folks wanting to go wild with NFS 4.1. The other features work just fine: vMotion, HA, FT, DRS, and Host Profiles.
pNFS is not Session Trunking
I’m seeing a lot of weird discussions on pNFS (Parallel NFS) versus Session Trunking.
To be clear, Session Trunking is similar to MPIO with iSCSI or Fiber Channel – you create multiple paths (sessions) to the NAS array and distribute load across those sessions. It’s up to the storage array vendor to determine if that can be done across NAS controllers or not based on their architecture: active / active or active / passive controllers. Session Trunking is awesome because it trivializes 99% of the work required to actively use multiple network links to an NFS datastore.
pNFS is a unicorn and is barely defined in the NFS 4.1 standard. vSphere 6 does not support pNFS, and neither does most anything else in the world. Here’s a high level diagram I used to describe the topology while at the Melbourne VMUG Conference back in early 2014.
It’s up to the storage vendor to figure out the metadata server, the control plane, and the data protocol used on the storage arrays – that’s not part of the NFS 4.1 specifications.
NFS Versions and Guidelines
If you’re looking to use NFS version 4.1 today, you’ll need to make sure that all ESXi hosts are mounting the datastore using 4.1. You can’t mix protocol versions for the same datastore across hosts because they use entirely different locking methods. I was pondering this and came up with a few migration ideas:
- Mount a new volume using NFS 4.1 and Storage vMotion workloads to it.
- Take an outage and switch all of the hosts over to 4.1
Both of these ideas kinda suck. It would be much nicer if there were a migration path in place that allows both locking protocols to work in tandem during the migration.
VMware has also published some helpful NAS guidelines in their storage guide. Specifically:
- To use NFS 4.1, upgrade your vSphere environment to version 6.0. You cannot mount an NFS 4.1 datastore to hosts that do not support version 4.1.
- You cannot use different NFS versions to mount the same datastore. NFS 3 and NFS 4.1 clients do not use the same locking protocol. As a result, accessing the same virtual disks from two incompatible clients might result in incorrect behavior and cause data corruption.
- NFS 3 and NFS 4.1 datastores can coexist on the same host.
- vSphere does not support datastore upgrades from NFS version 3 to version 4.1.
- When you mount the same NFS 3 volume on different hosts, make sure that the server and folder names are identical across the hosts. If the names do not match, the hosts see the same NFS version 3 volume as two different datastores. This error might result in a failure of such features as vMotion. An example of such discrepancy is entering filer as the server name on one host and filer.domain.com on the other. This guideline does not apply to NFS version 4.1.
- If you use non-ASCII characters to name datastores and virtual machines, make sure that the underlying NFS server offers internationalization support. If the server does not support international characters, use only ASCII characters, or unpredictable failures might occur.
Preparing for Kerberos Authentication
If you’d like to use Kerberos authentication, which isn’t a requirement, here’s how you set it up. First, make sure you have a reliable time source configured on the host to avoid a time drift between the Kerberos server (AD DC) and your host. A skew of 5 minutes will typically result in failure.
Next, add your host to the AD domain. This is the same process it has always been.
Finally, give the host an AD service account to use for authenticating to the NAS. You can only configure one account per host and VMware recommends using the same account across all hosts.
So long as the NFS volume can be mounted by that service account and supports RPC header signing (header auth), along with DES (specifically DES-CBC-MD5), you’re good to go. Apparently AES-HMAC was not used due to lack of vendor support, which is why such an old crypto is used.
Mounting an NFS 4.1 Share
Here’s the workflow to mount an NFS share using protocol version 4.1. First, add a datastore and select NFS 4.1.
Configure the datastore name, path (folder), and server address. If you want to enable session trunking for multiple paths, enter multiple IP addresses that are available on your NAS. Most arrays will allow you to configure virtual interfaces (VIFs) or virtual IPs (VIPs). Below I’ve configured 172.16.40.111 and .112 as VIFs on a NAS running NFS 4.1.
Optionally, enable Kerberos auth. If you don’t check this box, the host will use AUTH_SYS.
When you fire over a mount request, the host will log this entry to vmkernel.log:
1 | NFS41_VSIMountSet:402: Mount server: 172.16.40.111,172.16.40.112, port: 2049, path: Treasure, label: Treasure, security: 2 user: svc_nfs@glacier.local, options: <none> |
Practice taught me that incorrectly configuring the volume permissions resulted in an access denied error, or mounting the storage as read only. Here’s a sample connection to an improperly configured Windows Server providing storage via NFS.
1 | WARNING: NFS41: NFS41FSCompleteMount:3601: RECLAIM_COMPLETE FS failed: Failure; forcing read-only operation |
The new vSphere Web Client will show you specifically what version of NFS is being used to mount the datastore.
Link Failure
I deleted the 172.16.40.111 address from the NAS server to simulate a link failure. As expected, the hypervisor got a little upset.
1 2 3 4 | WARNING: SunRPC: 3947: fail all pending calls for client 0x4302f12611a0 (socket half closed) WARNING: SunRPC: 3947: fail all pending calls for client 0x4302f125f9a0 (socket half closed) WARNING: SunRPC: 3947: fail all pending calls for client 0x4302f125f9a0 (socket disconnected) WARNING: SunRPC: 3947: fail all pending calls for client 0x4302f125f9a0 (socket disconnected) |
But the NFS share remained mounted and the contents were still viewable. Look, it’s my hidden treasure!
Support for NFS 4.1
I actually had a bit of difficulty finding something for my lab that even supports 4.1. Supporting NFS 4.0 isn’t enough.
In case you’re curious, here’s a list of arrays that do not support NFS 4.1:
- Synology DSM 5.1
- FreeNAS
- OpenFiler
- Nexenta’s NexentaStor Community Edition
- NetApp’s OnTAP Simulator (8.2.1 7-mode)
In the end, I used a Windows Server 2012 VM with the NFS Server feature installed. Ironic, eh?
Perhaps you’re curious what happens when you try to mount an NFS volume using 4.1 to a NAS that does not support that protocol? It’s simple: you’ll get a timeout failure. Not terribly helpful, is it?
Dive into vmkernel.log to find more details. The following log detail is quite helpful!
1 | WARNING: NFS41: NFS41ExidNFSProcess:2008: Server doesn't support the NFS 4.1 protocol |
It would be nice if that error rolled up into the mount task.
Thoughts
I want to spend much more time experimenting with failure scenarios and really seeing how well the session trunking configuration is able to balance load across the physical links. With that said, I’m really excited to see NFS 4.1 come to ESXi. NFS 3 is extremely old and clunky but also quite reliable and able to push some great performance numbers. My hope is that vendors will now have much more of an incentive to add NFS 4.1 support to their arrays and future iterations of the hypervisor will continue to improve upon the IETF standard (RFC 5661).
How will the Kerberos authentication work in a typical SMB set up where the AD is running on a vmdk that is on the NFS in question when it was all down for a power outage or a move. i.e. do we have a chicken vs egg problem that means these small environments can’t safely use this feature due to all the eggs being in that one vCentre basket.
Also what other directory services are supported? (eg, eDirectory and/or plain LDAP)
To be clear, vCenter isn’t in the dependency path; the host itself is authenticating against Active Directory. One idea I can think of is to put your critical infrastructure VMs on a non-Kerberos mounted NFS share, and use Kerberos for the other workloads that are entrusted with your data.
Smart Card integration ? could be Awesome if it’s supported for login too.
Not that I’m aware of. It seems like vendor support of NFS 4.1 is the real issue here, and so we need the arrays to catch up to the protocol. But at least now they have a 500K+ install base that may potentially request NFS 4.1 support. 🙂
[…] Source: VMware Embraces NFS 4.1, Supports Multipathing and Kerberos Authentication – Wahl Network […]
[…] VMware Embraces NFS 4.1, Supports Multipathing and Kerberos Authentication by Chris Wahl […]
When you have multiple NFS4.1 server IPs for a mount in vSphere 6, are they used in a round robin or active / passive manner?
[…] single-session connectivity. HOWEVER; going over Chris Wahl’s – extensive – coverage on this news item I did miss a key ingredient we were actually waiting for in NFS4.1: support for pNFS […]
[…] What’s new in vSphere 6.0 (vPirate) What’s New in vSphere 6 (Vroom Blog) vSphere 6.0 (vTerkel) VMware Embraces NFS 4.1, Supports Multipathing and Kerberos Authentication (Wahl Network) Controlling a Virtual Data Center with vSphere 6 Policies, Profiles, and Tags (Wahl […]
[…] ESXi NFS 4.1: This will technically enable active/active multipathing for NFS arrays (Parallel NFS) from the ESXi server. This will require an NFS 4.x aware array. Possibly with new multipathing policies for NFS. NFS 4.x requires a server to store metadata. Not sure which storage vendors support NFS 4.1 at this point but definitely a big step forward as this will greatly enhance the performance of NFS filesystems. More on PNFS http://www.pnfs.com/ *pNFS is currently not supported on vSphere 6 (together with other features like SDRS, SIOC, SRM, VVOLs and NAS VAAI). In order to achieve more throughput session trunking can be used – see Chris Wahl’s article .) […]
In the NFS Versions and Guidelines, the first method used: 1.Mount a new volume using NFS 4.1 and Storage vMotion workloads to it. WILL WORK PERFECTLY.
[…] VMware Embraces NFS 4.1, Supports Multipathing and Kerberos Authentication http://wahlnetwork.com/2015/02/02/nfs-v4-1/ […]
Nice article Chris. Looking forward to see NFS 4.1 on Synology Box (it has already been requested but more voice we have, better it is: http://forum.synology.com/enu/viewtopic.php?t=87718)
[…] NFS 4.1 supports multipathing @ChrisWahl has an excellent post on it here: VMware Embraces NFS 4.1, Supports Multipathing and Kerberos Authentication […]
[…] vSphere 6 NFS v4.1 Overview from @ChrisWahl […]
[…] http://wahlnetwork.com/2015/02/02/nfs-v4-1/ […]
How did you set your permissions to get around this error? I could not get Kerberos functionality working either.
NFS41: NFS41FSCompleteMount:3601: RECLAIM_COMPLETE FS failed: Failure; forcing read-only operation
For the lab, I believe that I just opened up the authentication to circumvent the connection. It was very tough to find an endpoint for my lab that supported NFS 4.1, so I really haven’t had a good testing experience (versus having more enterprisey hardware to try). 🙂
Looks like you used WS2012 as your NFS 4.1 server, which I am using as well. ESXi 6.0 is my client. How did you get these two to play nicely with NFS 4.1. Everything I’ve tried failed. If I can get R/W access working without Kerberos I’d be happy. In your blog you showed an example of NFS 4.1 mounting as R/O because of the permissions set on the WS2012 NFS share. Did you change the NTFS permissions to allow Everyone full access?
Have you opened a ticket with VMware or tried the VMTN forums? That’s probably the simplest route to resolve the issue; my NFS 4.1 test box has been blown away for quite some time.
I’ve allowed Everyone full access (NTFS) and R/W + root access for the NFS permissions just FYI.
Hi Chris!
I am curious… with nfs4.1 how many vmkernel did you created to use multipath correctly?
Is FT supported in NFS 4.1? The article says yes, but I think it does not.
Per VMware documentation, NFS 4.1 does support FT. https://pubs.vmware.com/vsphere-60/topic/com.vmware.vsphere.storage.doc/GUID-8A929FE4-1207-4CC5-A086-7016D73C328F.html
Chris, as a esxi 5.0 user contemplating the upgrade to 6.0.1a with all available patches to fix things like the nasty CBT bug… I’m curious if I still need to segment out my NFS 3.0 filesystems and network adapters. I have an EMC VNX5200 with three file systems, each on their own subnet, so each has their own vmk nic and the EMC has its interface presented with 3 IPs on their own subnets. The reason for this is to balance machines across these datastores and trick vmware into sort of load balancing or allowing multiple connections to the same storage device. I’m pretty sure you were the one who did the writeup on this.
In NFS v4.1 would I need to keep each datastore on its own subnet, or is NFSv4.1 now that its connection aware allow vmware kernel to push my 10gbps jumbo frame ethernet to its limits without complicated multi-interface setup?
Session Trunking in NFS version 4.1 does not require multiple subnets. The session will initiate to multiple IP addresses even in the same subnet. However, it’s really up to your storage vendor to support this configuration – make sure they have a documented method before migrating from NFS version 3.0 to 4.1.
[…] new in vSphere 6.0 (vPirate) What’s New in vSphere 6 (Vroom Blog) vSphere 6.0 (vTerkel) VMware Embraces NFS 4.1, Supports Multipathing and Kerberos Authentication (Wahl Network) Controlling a Virtual Data Center with vSphere 6 Policies, Profiles, and […]