Search This Blog

Tuesday, 10 June 2014

When VMware NBD and NBDSSL backups fail

I was working on some VMware backups when I ran into this strange sequence of messages: a backup which is showing that that is quite possible to backup a VMX file, but not the VMDK files. And the error messages in the session aren't very informative!


[Normal] From: BSM@cell-manager.ifost.org.au "NBD test backup"  Time: 19/05/2014 10:30:16 AM
        Backup session 2014/05/19-5 started.

[Normal] From: BSM@cell-manager.ifost.org.au "NBD test backup"  Time: 19/05/2014 10:30:16 AM
        OB2BAR application on "vepa-agent.ifost.org.au" successfully started.

[Normal] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:30:17 AM
        Resolving objects for backup on vCenter 'vcenter.ifost.org.au' ... 

[Normal] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:30:33 AM
        Add Virtual Machine to the backup ... 
                Name: VM1
                Path: /DC/Discovered virtual machine/VM1
                InstanceUUID: 52dbf234-252e-c5dd-9df5-51c304bcf312

[Normal] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:30:35 AM
        Virtual Machine 'VM1': Locking vMotion ... 

[Warning] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:30:35 AM
         Virtual Machine 'VM1': vMotion is in Progress.

[Warning] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:31:08 AM
        Virtual Machine 'VM1': Could not lock vMotion.


Everything's pretty much fine. There are lots of reasons for a vMotion lock to fail.

[Normal] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:31:13 AM
        Creating folder /var/opt/omni/tmp/55fd5af8-f853-403e-bedf-2d1e60e418dd ... 

[Normal] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:31:19 AM  
        Virtual Machine 'VM1': Backing up configuration file VM1.vmx ... 

[Normal] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:31:20 AM
        Virtual Machine 'VM1': Creating snapshot ... 

[Normal] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:31:59 AM
        Virtual Machine 'VM1': Optimizing disk scsi0:0 ... 

[Normal] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:32:00 AM
        Virtual Machine 'VM1': Backing up VSS manifest  VM1/VM1-vss_manifests11.zip.

And now for the interesting part:


[Major] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:33:15 AM
        Virtual Machine 'VM1': Could not backup disk scsi0:0 ... 

[Major] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:33:15 AM
[172:162]       Virtual Machine 'VM1': No disk backed up ... 

[Critical] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:33:15 AM
        Backup of object failed.
                Name: VM1
                Path: /DC/Test_VMs/VM1
                InstanceUUID: 52dbf234-252e-c5dd-9df5-51c304bcf312

[Normal] From: VEPALIB_VMWARE@vepa-agent.ifost.org.au "/DC"  Time: 19/05/2014 10:33:16 AM
        Virtual Machine 'VM1': Removing snapshot ... 

[Normal] From: VEPALIB_VMWARE@vepa-agent.ifost.or.gau "/DC"  Time: 19/05/2014 10:33:24 AM
        Virtual Machine 'VM1': Unlocking vMotion ... 

Deleted directory /var/opt/omni/tmp/564d01b7-7910-97a0-d54d-85c11ff8becd-vm-58/nbd
Deleted directory /var/opt/omni/tmp/564d01b7-7910-97a0-d54d-85c11ff8becd-vm-58/nbdssl
Deleted directory /var/opt/omni/tmp/564d01b7-7910-97a0-d54d-85c11ff8becd-vm-58/hotadd

[Normal] From: BSM@cell-manager.ifost.org.au "NBD test backup"  Time: 19/05/2014 10:33:53 AM
        OB2BAR application on "vepa-agent.ifost.org.au" disconnected.

I've truncated the rest of the messages.
Turning up the debugging level, the debug logs showed this:
[110] [VddkUtil::diskLibLog] NBD_ClientOpen: attempting to create connection to vpxa-nfcssl://[ESXi-MGMT-VMFS-1] VM1/VM1.vmdk@esxi1.ifost.org.au:902

[110] [VddkUtil::diskLibLog] Started up WSA

[110] [VddkUtil::diskLibLog] CnxOpenTCPSocket: Cannot connect to server esxi1.ifost.org.au:902: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

[110] [VddkUtil::diskLibLog] CnxAuthdConnect: Returning false because CnxAuthdConnectTCP failed

[110] [VddkUtil::diskLibLog] CnxConnectAuthd: Returning false because CnxAuthdConnect failed

[110] [VddkUtil::diskLibLog] Cnx_Connect: Returning false because CnxConnectAuthd failed

[110] [VddkUtil::diskLibLog] Cnx_Connect: Error message: Failed to connect to server esxi1.ifost.org.au:902

[ 20] [VddkUtil::diskLibWarning] [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer. Error: Failed to connect to server esxi1.ifost.org.au:902

[110] [VddkUtil::diskLibLog] NBD_ClientOpen: Couldn't connect to esxi1.ifost.org.au:902 Failed to connect to server esxi1.ifost.org.au:902
The clue is the failed connection to esxi1.ifost.org.au. The VEPA backup agent obviously has to connect to the Vcenter server in order to start a backup, but because there was no SAN connectivity between the VEPA agent and the LUNs supporting the VM1 virtual machine's VMDK files, the VEPA agent ends up having to talk to the ESX server directly as well.

There can be many reason for this connection to fail: a firewall could be blocking the connection between the vepa agent and the esx server. Or in this case, there was no DNS entry for esxi.ifost.org.au didn't exist.


Greg Baker is an independent consultant working on HP DataProtector, LiveVault and many other technologies. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector