VMware Cloud Community
pkharekofax
Enthusiast
Enthusiast
Jump to solution

Corrupted VMFS data stores - how to root-cause?

Hi all,

We are running Lab Manager 4, so we still have ESXi 4.1 hosts in our infrastructure that we cannot upgrade to ESXi 5.x.

At one of our sites, we regularly see VMFS corruption, where some VM folders cannot be read/deleted/listed.  We are in the process of adding new storage and migrating our current VMs out of the corrupted data stores, but I'd like to find out what is causing this corruption, and how can I prevent it from happening in the future?

Hosts: HP DL360p G7

Storage: HP P2000 G3

ESXi: 4.1.0 build-260247

If we see many such errors, once we reboot all the hosts, the number goes down considerably (e.g. from 20 errors, we are left with just 3).

Have you seen such an issue at your site?

Thanks,

Khare

Examples:

/vmfs/volumes/4fc87616-2a581cdc-6b13-3cd92bee82a4/dirname # ls -l * >/dev/null

ls: 10589/timA: Invalid argument

ls: 10589/: Invalid argument

ls: 10589/

: Invalid argument

ls: 10589/: Invalid argument

ls: 10589/: Invalid argument

ls: 10589/

nvalid argument

ls: 11843/IHDR: Invalid argument

ls: 11843/e: Invalid argument

ls: 11941/n="1.0"?>

<Foundry>

<VM>

<VMId type="string">52 32 c9 78 f5 47 c6 6c-03 d9 ba 14 18 2d 42 e4</VMId>

<ClientMetaData>

<clientMeta: No such file or directory

ls: 11941/es/>

<HistoryEventList/></ClientMetaData>

<vmxPathName type="string">017695-Central1.vmx</vmxPathName></VM></Foundry>

: No such file or directory

ls: 12351/IHDR: Invalid argument

ls: 12351/e: Invalid argument

ls: 12478/012478-12478-2k8-Central.vmx.LMBackup: No such file or directory

ls: 12478/012478-12478-2k8-Central.vmx.LMBackup: No such file or directory

ls: 7600/IHDR: Invalid argument

ls: 7600/YJ)Ù|Ñ6"¢ªyc,zö!ïýS: Invalid argument

ls: 9251/009251-VirtualRouter_C2796F3338_DoNotModify.vmxf: No such file or directory

ls: 9251/009251-VirtualRouter_C2796F3338_DoNotModify.vmxf: No such file or directory

ls: 9251/009251-VirtualRouter_C2796F3338_DoNotModify.vmsd: No such file or directory

ls: 9618/IHDR: Invalid argument

0 Kudos
1 Solution

Accepted Solutions
pkharekofax
Enthusiast
Enthusiast
Jump to solution

Since upgrading firmware on our storage - HP P2000 - we haven't had any errors or issues.  So looks like this is fixed.

Khare

View solution in original post

0 Kudos
3 Replies
continuum
Immortal
Immortal
Jump to solution

Dont blame version 4.1 - you will see that in all versions - even the latest ones.

To avoid this kind of problems here are a few things ...
- make sure you always cleanly unmount and detach volumes if you no longer need them
- make sure you always fix dead LUNs immediatly

- make sure only skilled persons are allowed to change / add / remove datastores
- make sure that your VMs that are backed up by tools like Veeam have a healthy VSS implementation
- make sure that your backup-tools are monitored carefully to avoid that snapshot chains build up and go amok
- make sure that /etc/vmware/esx.conf does not reference non-existant storage

also dont use very small and cheap USB-sticks as boot media for ESXi

It is also a good idea to practice a few procedures like rebuilding a partitiontable or extracting files when the VMFS becomes unreadable - as not always a reboot will fix the issues.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

pkharekofax
Enthusiast
Enthusiast
Jump to solution

Thanks for all the pointers, Ulli.  I've asked the storage folks if they've had any power outages where the storage has gone down hard, and if the firmware is up to date.

We haven't made any changes here like mounting/unmounting/adding/removing.  It's a pretty stable environment.  We also don't use any backup software like Veeam, as this is Lab Manager and is made of trees of linked clones.

Khare

0 Kudos
pkharekofax
Enthusiast
Enthusiast
Jump to solution

Since upgrading firmware on our storage - HP P2000 - we haven't had any errors or issues.  So looks like this is fixed.

Khare

0 Kudos