This discussion is locked
jonflas

Q: Problems with Xsan

Hi,

I have a promise Vtrak 610f with 2 enclosures and disks of 750GB.

Few months ago I have a big problem with my Xsan because it was corrupted.

Now I'm trying to install all from begining but I had a problem yesterday with the Xsan again . My Metacontrollers can't mount the volume that it is created.

I have 4 xserves, 2 of them MC and the others Clients. I have 2 Luns of data of 6TB (12TB in total) and 2 LUNs for metadatas (750 TB each one, 1,5 TB in total).

Yesterday, the one of the MC disconnect the XSAN and the other one gets it but the volume wasn't avaliable and I dont know why . This time the system isn't in production, I'm making tests but I don't want to have problems in the future .

I would like to know what am I doing bad. Maybe, I need more LUNS defined in my promise?

The error that is reported in the MC whenn try to mount the volume is the following:

+[0202 08:51:16.519195] 0xa07b3720 (Debug) RSVD SUMMARY [192.168.0.208] reserved space max requested/0 MB accounted now/0 MB+
+[0202 08:51:16.519202] 0xa07b3720 (Debug) VOP SUMMARY [192.168.0.208] VopCapNegotiate cnt/1 avg/27+27 min/27+27 max/2727.
+[0202 08:51:16.519213] 0xa07b3720 (Debug) VOP SUMMARY [192.168.0.208] VopClientId cnt/1 avg/25+24 min/25+24 max/2524.
+[0202 08:51:16.519229] 0xa07b3720 (Debug) FSM RSVD SPACE current 4230 MB actual 0 MB max since boot 4230 MB since last stats 4230 MB+
+[0202 08:51:16.519232] 0xa07b3720 (Debug) FSM RSVD clients with files open for write: now 0 max 0 last 0+
+[0202 08:51:16.519236] 0xa07b3720 (Debug) FSM RSVD clients writing: now 0 empty 0 max 0 last 0+
+[0202 08:51:16] 0xb8e2f000 (*FATAL*) PANIC: aborting threads now.+
+Logger_thread: sleeps/131509 signals/2 flushes/14692 writes/14694 switches 2+
+Logger_thread: logged/168922 clean/168922 toss/0 signalled/2 toss_message/0+
+Logger_thread: waited/0 awakened/0+

+[0202 13:12:08] 0xa07b3720 (debug) NSS: sending message (type 2) to Name Server '192.168.0.207' (192.168.0.207:49153).+
+[0202 13:12:08] 0xa07b3720 (debug) NSS: sending message (type 2) to Name Server '192.168.0.206' (192.168.0.206:49154).+
+[0202 13:12:08] 0xa07b3720 INFO NSS: election initiated by 192.168.0.206:49154 (id 195.235.180.206) - admin request.+
+[0202 13:12:08] 0xa07b3720 (debug) NSS_VOTE2 to 192.168.0.206:49154+
+[0202 13:12:08] 0xa07b3720 (debug) startfssvote could not find FSS Cronos in master - vote aborted.+
+[0202 13:12:08] 0xa07b3720 (debug) NSS: removing vote inhibitor for FSS 'Cronos'.+
+[0202 13:12:09] 0xa07b3720 NOTICE PortMapper: Initiating activation vote for FSS 'Cronos'.+
+[0202 13:12:09] 0xa07b3720 (debug) Initiatenssvote for FSS Cronos+
+[0202 13:12:09] 0xa07b3720 (debug) NSS: sending message (type 2) to Name Server '192.168.0.207' (192.168.0.207:49153).+
+[0202 13:12:09] 0xa07b3720 (debug) NSS: sending message (type 2) to Name Server '192.168.0.206' (192.168.0.206:49154).+
+[0202 13:12:09] 0xa07b3720 INFO NSS: election initiated by 192.168.0.206:49154 (id 195.235.180.206) - admin request.+
+[0202 13:12:09] 0xa07b3720 (debug) NSS_VOTE2 to 192.168.0.206:49154+
+[0202 13:12:09] 0xa07b3720 (debug) startfssvote could not find FSS Cronos in master - vote aborted.+
+[0202 13:12:09] 0xa07b3720 (debug) NSS: removing vote inhibitor for FSS 'Cronos'.+


+Feb 2 13:12:10 hermes Xsan Admin[18617]: ERROR: Error starting volume…: Operation could not be completed. (SANTransactionErrorDomain error 100036.) (100036)+
+Feb 2 13:12:10 hermes servermgrd[94]: xsan: [94/103B80] ERROR: getfsm_processstats(Cronos): Unable to find pid of fsm+
+Feb 2 13:12:52 hermes servermgrd[94]: xsan: [94/103B80] ERROR: getfsm_processstats(Cronos): Unable to find pid of fsm+
+Feb 2 13:13:52 hermes servermgrd[94]: xsan: [94/103B80] ERROR: getfsm_processstats(Cronos): Unable to find pid of fsm+
+Feb 2 13:14:52 hermes servermgrd[94]: xsan: [94/103B80] ERROR: getfsm_processstats(Cronos): Unable to find pid of fsm+

Please, could u help me?

Thanks for all.

Álvaro

Posted on Feb 2, 2011 4:19 AM

Close

Q: Problems with Xsan

  • All replies
  • Helpful answers

  • by Lucas Nap,

    Lucas Nap Lucas Nap Feb 2, 2011 4:26 AM in response to jonflas
    Level 1 (0 points)
    Feb 2, 2011 4:26 AM in response to jonflas
    Hi Álvaro,

    I can think of a million things you could check, I think the best way is to buy the Xsan2 Admin book.

    http://www.peachpit.com/store/product.aspx?isbn=0321613228

    This book tells you how to setup your fibre, network, DNS, LUN's, etc...
  • by jmyres,

    jmyres jmyres Feb 2, 2011 9:46 AM in response to jonflas
    Level 1 (80 points)
    Feb 2, 2011 9:46 AM in response to jonflas
    From the looks of it your MDCs are attempting a quorum vote to see which one will take over the volume, but they can't complete it successfully, so they are unmounting the volume to preserve your data.

    Corruption? What kind? The file system itself or your metadata? Which versions of OS X and Xsan are you using? The most stable combinations of Xsan and OS I've found are OS 10.5.8 and 2.1.1, or 10.6.4 and 2.2.1. I have run into very bad metadata corruption with an Xsan running, for instance 10.5.8 and 2.2.1, which required a full back-up and re-build of the Xsan Volume using 2.1.1.

    A good way to start is usually by verifying your networks. Make sure you can:


    1) Ping each MDC from each client on your public network.
    
2) Ping each MDC from each client on your metadata network.

    3) Verify that your DNS is resolving on each of your MDCs and clients.

    4) Open Disk Utility on each machine and verify that all of your LUNs are presenting correctly over fibre.




    Once you know that's all ok, then you can start troubleshooting software issues. 

Really, though, if you have valuable data on your Xsan and do not have back-up in place (or even if you do), I'd hire in some help. If your volume won't mount, that's a sign of important problems, and the Xsan is trying to protect you from them by keeping the volume offline.

    

JM