Support Cross-Cluster Migrations using S2D/SOFS Storage in Converged Environment
SCVMM (VMM) & S2D should support cross-cluster migration of highly available VMs.
Migrations of highly-available VMs from one cluster to another while using S2D and SOFS as a storage solution (NOT hyper-converged) should be supported by VMM.
VMM has built-in functionality to allow migration of a VM from one cluster to another.
This does NOT work with HA VMs when your storage solution is SOFS on S2D. The VM will be placed in pause-critical, effectively a failed migration.
I believe this is a shortcoming on the part of Failover Clustering not properly handling storage communication, but not sure as this action isn't able to be automated outside VMM.
I have now tested this in Windows Server 2016, 2019, with VMM 2016 and 1807 and it doesn't work.
The VM is stopped in a 2016 environment and put into a paused-critical state in 2019. This appears to be due to some unseen error and the default AutomaticCriticalErrorAction, but I'm not sure.
This is functionality which worked under 2012 but S2D wasn't an option, so my guess is this is a failing as a combination of VMM, failover clustering and S2D w/SOFS as storage.
Note that this DOES WORK ONLY when migrating TO the Current Host Server (as shown in FOC Manager) on the target cluster.
Anderson Shen commented
please support Windows 2019 Cluster Sets to achieve cross cluster migration
Lars Buchleitner commented
Has anybody of you opened a Case at MS that I can refer to if we decide to Open one?
No, per my extensive troubleshooting (for almost 6 months straight until I gave up near the start of 2019) the refresh is not the *cause*; it's just updating your view of what is actually happening. A refresh in VMM is the same action as going into Hyper-V Manager and selecting refresh - it simply queries the VM's status from the VM itself; this fails of course, if the VM has stopped.
If you pull up the VM in a console window in either HV or VMM, you will see when the machine crashes.
A good example is using a Linux machine where I/O errors will output directly to the terminal line (STDOUT).
If you pull up a CentOS box, move it to another cluster using VMM, shortly after the move is complete you will see either an I/O error or your connection will just fail. You'll know the I/O error when you see it. Note this has to be done via console NOT SSH.
Note this ONLY HAPPENS with HIGHLY AVAILABLE VMs. If your issue doesn't fit that description it is probably caused by something else. I have come to believe this is effectively a problem with how VMM handles communicating its tasks with Failover Cluster Manager which is only really involve when the VM is highly available.
Of course, you can't reproduce this scenario using FOC or HV Manager because there are no set of options which would allow you to facilitate a cross-cluster migration while *leaving the machine highly available* at the same time.
You could try to simulate this, but if you look at the series of Win-RM and PowerShell cmdlet calls used to make this happen, you'd be doing a completely different set of tasks by simulating it manually: remove from cluster, move to new node on other cluster, re-add to cluster, change ownership of storage assets from one node to another, change management of cluster resources from one cluster to another.
VMM is currently the only tool that theoretically has the ability to do this, so your example using PowerShell is invalid to this scenario because 1). it's not managing the cluster resources and 2). it's not using Win-RM in the same fashion as VMM. That makes me wonder if your scenario is the same and indeed is a highly-available VM.
But the refresh task can be ruled out, if only simply because by the time that happens, the machine has actually crashed.
Same issue here.
When performing a live migration of VMs accross different Hyper-V Clusters while also migration the storage to another SOFS cluster the VM only stays online when choosing the current owner node of the destination Hyper-V cluster.
If migrating to any Hosts not beeing the current cluster owner on the destination the VM immediatly fails after migration completes.
The issue does generally not occur when performing a similar migration outside of VMM (e.g. via Hyper-V Powershell) as long as no refresh of the VM is done from VMM within a few minutes after the migration completes.
If doing a refresh on the VM in VMM shortly after the migration completes then the VM also terminates.
I assume this is also the reason why the Migration is causing an outage when initiated from within VMM as it seems to do the refresh as part of the migation tasks at the end of the migration.
VMM 2016 UR5
Server 2016 S2D SOFS Shares