Damjan Cvetko

Damjan Cvetko

Developer, System Architect, Hacker.

4 minutes read

So you got an idea, or better somebody made you, migrate from a cosy EMC NFS server to a Windows Failover Cluster with File Server role serving NFS to Linux clients.

All fine and great… Until it's not.

When you see this for the first time, I bet you'll feel lost too:

www-data@docker9:/mnt/nfs$ chown www-data testfile
chown: changing ownership of 'testfile': Permission denied

Let's look at the issues we had doing this.

Until recently we ran an EMC VNX 5200 with redundant Data Movers that served NFS and (although unused) CIFS shares. The performance was decent, we had just one outage but even that got mitigated right away as its was just one DM rebooting itself.

The old EMC got decommissioned, we got a brand new one, but it had a different purpose. We moved all the virtual servers that were previous on EMC (Fiber Channel connected) to a Microsoft Failover Cluster with Storage Spaces Direct. It's a great thing, with killer IOPS performance so an “executive” decision was made to also move that NFS share we needed to the same platform. After all, IOPS!

I won't go into installation, but we created two Windows Server 2016 VMs, one Shared Drive - VHD Set and configured a File Server role.

First problem was the NFS access permissions.

In a classis AUTH_SYS environment you can't set permissions to IP wildcards or CIDR! With 10s of hosts and multiple subnets managing this is a shot in the foot itself. But for performance, we were ready take this one.

Then Case sensitivity.

Since backed by NTFS, files were case insensitive, and we could collide with paths like /Test/test and /test/Test. Powers that be insisted this was bad system design this is happening, but luckily the collisions we had were few and irrelevant. I want to note that there is a solution for this with fsutil.exe file setCaseSensitiveInfo C:\folder enable but our installation was too old (Windows Server 2016) to install the required WSL feature.

Note: Apparently this could also be used: reg add "HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel" /v obcaseinsensitive /t REG_DWORD /d 0.

Next, transfer performance.

Initially our test hosts mounted the shares as NFSv3, because NFSv4 just didn't work. File transfer performance was order of magnitude (20x or worse) slower. This was mitigated by actually reading the fine print and realizing that Windows NFS does not support protocol 4.0 but does 4.1.

Permission denied on chown

So, with decent performance we bounced into one last issue. The Permission denied when a user chowns a file they already own, to their own user. Well, it wasn't that obvious. The problem showed itself when moving a file - in a web server - in docker… Finally reproduced the issue in shell and then with Wireshark figured it has to do with mv issuing a chown in the background.

bash-5.0$ mv /tmp/testfile /mnt/nfs/tmp/
mv: can't preserve ownership of '/mnt/nfs/tmp/testfile': Permission denied

Since Windows NFS is backed by a NTFS drive and it's NTFS permission model caused this problem. Googling high and low I finally discovered the option RestrictChown here and eventually here.

This flag should be in registry located in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ServerForNfs\CurrentVersion\Exports\<No>/Restrictchown. However, in our clustered environment, there was no such key under Exports.

I have not found a solution anywhere and then found on my own, that in clustered environment, a certain part of “shared” Registry is located under HKEY_LOCAL_MACHINE\Cluster\.... Each Clustered resource has it's own Key. To figure what the ID is you can either look at them and guess, or use Get-ClusterResource -Name <Name> | fl -Property *. Under that key is another key Parameters and the proper way to manipulate values here is with Get-ClusterResource -Name NFS-clnfs | Get-ClusterParameter. However, what we need is even deeper down:

I found no way to manipulate a sub path like img\RestrictChown with command line tools, so the only way left was to change the value on both servers, and reboot them.

I wish I found a better way…

Recent posts

See more

Categories

About