Search This Blog

Tuesday, 10 June 2014

When VMware NBD and NBDSSL backups fail

I was working on some VMware backups when I ran into this strange sequence of messages: a backup which is showing that that is quite possible to backup a VMX file, but not the VMDK files. And the error messages in the session aren't very informative!


[Normal] From: [email protected] "NBD test backup"  Time: 19/05/2014 10:30:16 AM
        Backup session 2014/05/19-5 started.

[Normal] From: [email protected] "NBD test backup"  Time: 19/05/2014 10:30:16 AM
        OB2BAR application on "vepa-agent.ifost.org.au" successfully started.

[Normal] From: [email protected] "/DC"  Time: 19/05/2014 10:30:17 AM
        Resolving objects for backup on vCenter 'vcenter.ifost.org.au' ... 

[Normal] From: [email protected] "/DC"  Time: 19/05/2014 10:30:33 AM
        Add Virtual Machine to the backup ... 
                Name: VM1
                Path: /DC/Discovered virtual machine/VM1
                InstanceUUID: 52dbf234-252e-c5dd-9df5-51c304bcf312

[Normal] From: [email protected] "/DC"  Time: 19/05/2014 10:30:35 AM
        Virtual Machine 'VM1': Locking vMotion ... 

[Warning] From: [email protected] "/DC"  Time: 19/05/2014 10:30:35 AM
         Virtual Machine 'VM1': vMotion is in Progress.

[Warning] From: [email protected] "/DC"  Time: 19/05/2014 10:31:08 AM
        Virtual Machine 'VM1': Could not lock vMotion.


Everything's pretty much fine. There are lots of reasons for a vMotion lock to fail.

[Normal] From: [email protected] "/DC"  Time: 19/05/2014 10:31:13 AM
        Creating folder /var/opt/omni/tmp/55fd5af8-f853-403e-bedf-2d1e60e418dd ... 

[Normal] From: [email protected] "/DC"  Time: 19/05/2014 10:31:19 AM  
        Virtual Machine 'VM1': Backing up configuration file VM1.vmx ... 

[Normal] From: [email protected] "/DC"  Time: 19/05/2014 10:31:20 AM
        Virtual Machine 'VM1': Creating snapshot ... 

[Normal] From: [email protected] "/DC"  Time: 19/05/2014 10:31:59 AM
        Virtual Machine 'VM1': Optimizing disk scsi0:0 ... 

[Normal] From: [email protected] "/DC"  Time: 19/05/2014 10:32:00 AM
        Virtual Machine 'VM1': Backing up VSS manifest  VM1/VM1-vss_manifests11.zip.

And now for the interesting part:


[Major] From: [email protected] "/DC"  Time: 19/05/2014 10:33:15 AM
        Virtual Machine 'VM1': Could not backup disk scsi0:0 ... 

[Major] From: [email protected] "/DC"  Time: 19/05/2014 10:33:15 AM
[172:162]       Virtual Machine 'VM1': No disk backed up ... 

[Critical] From: [email protected] "/DC"  Time: 19/05/2014 10:33:15 AM
        Backup of object failed.
                Name: VM1
                Path: /DC/Test_VMs/VM1
                InstanceUUID: 52dbf234-252e-c5dd-9df5-51c304bcf312

[Normal] From: [email protected] "/DC"  Time: 19/05/2014 10:33:16 AM
        Virtual Machine 'VM1': Removing snapshot ... 

[Normal] From: [email protected] "/DC"  Time: 19/05/2014 10:33:24 AM
        Virtual Machine 'VM1': Unlocking vMotion ... 

Deleted directory /var/opt/omni/tmp/564d01b7-7910-97a0-d54d-85c11ff8becd-vm-58/nbd
Deleted directory /var/opt/omni/tmp/564d01b7-7910-97a0-d54d-85c11ff8becd-vm-58/nbdssl
Deleted directory /var/opt/omni/tmp/564d01b7-7910-97a0-d54d-85c11ff8becd-vm-58/hotadd

[Normal] From: [email protected] "NBD test backup"  Time: 19/05/2014 10:33:53 AM
        OB2BAR application on "vepa-agent.ifost.org.au" disconnected.

I've truncated the rest of the messages.
Turning up the debugging level, the debug logs showed this:
[110] [VddkUtil::diskLibLog] NBD_ClientOpen: attempting to create connection to vpxa-nfcssl://[ESXi-MGMT-VMFS-1] [email protected]:902

[110] [VddkUtil::diskLibLog] Started up WSA

[110] [VddkUtil::diskLibLog] CnxOpenTCPSocket: Cannot connect to server esxi1.ifost.org.au:902: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

[110] [VddkUtil::diskLibLog] CnxAuthdConnect: Returning false because CnxAuthdConnectTCP failed

[110] [VddkUtil::diskLibLog] CnxConnectAuthd: Returning false because CnxAuthdConnect failed

[110] [VddkUtil::diskLibLog] Cnx_Connect: Returning false because CnxConnectAuthd failed

[110] [VddkUtil::diskLibLog] Cnx_Connect: Error message: Failed to connect to server esxi1.ifost.org.au:902

[ 20] [VddkUtil::diskLibWarning] [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect to peer. Error: Failed to connect to server esxi1.ifost.org.au:902

[110] [VddkUtil::diskLibLog] NBD_ClientOpen: Couldn't connect to esxi1.ifost.org.au:902 Failed to connect to server esxi1.ifost.org.au:902
The clue is the failed connection to esxi1.ifost.org.au. The VEPA backup agent obviously has to connect to the Vcenter server in order to start a backup, but because there was no SAN connectivity between the VEPA agent and the LUNs supporting the VM1 virtual machine's VMDK files, the VEPA agent ends up having to talk to the ESX server directly as well.

There can be many reason for this connection to fail: a firewall could be blocking the connection between the vepa agent and the esx server. Or in this case, there was no DNS entry for esxi.ifost.org.au didn't exist.


Greg Baker is an independent consultant working on HP DataProtector, LiveVault and many other technologies. He is the author of the only published book on HP Data Protector (http://x.ifost.org.au/dp-book). See more at IFOST's DataProtector pages at http://www.ifost.org.au/dataprotector