Skip navigation
This discussion is archived

all storage-pools missing from XSAN volume after crash during expansion

3301 Views 6 Replies Latest reply: Dec 12, 2008 12:25 PM by c. steeley RSS
c. steeley Calculating status...
Currently Being Moderated
Dec 10, 2008 4:51 PM
today we added an additional storage pool (named hippo4) consisting of 1 LUN to a volume that had three storage pools (with 1 LUN) each. XSAN admin crashed while refreshing volume info, server crashed and displayed the deceptively pleasant grey screen telling you to hard-reboot. once we rebooted, the volume would not start.

checked the logs and saw something to the effect if icb configuration mismatch.

we ran (in this order):

cvfsck -j
cvfsck -nv (reported back as dirty)
cvfsck -wv
cvfsck -nv (reported back as clean)

now we can "start" the volume but there are no storage pools listed in the XSAN admin.

looked through the log file and saw this:

[1210 16:34:05] 0xa000d000 (*FATAL*) PANIC: /Library/Filesystems/Xsan/bin/fsm "
Stripe Group 3 has an existing configuration name of ""
which does not match the new name of "hippo4". You must
maintain the original stripe group order and/or name in the
configuration or re-initialize the file system.
File system 'GAZZXSAN' not started.
" file alloc.c, line 2853

anyone have any suggestions? i'll post the entire log contents below.
we thought about just physically disconnecting the new storage pool
but something about that doesn't sit well...

thanks.
-cs


228955420.00 secs his delta 0.00 secs my delta 0.00 secs.
[1210 16:30:26.639412] 0xa000d000 (Debug) FOUsurpCheck: polling ARB block to check for active peer (pass 1).
[1210 16:30:27.639834] 0xa000d000 (Debug) FOUsurpCheck: read ARB info (pass 2): host (169.230.98.30:49532) conns 0 age 1228955420.00 secs his delta 0.00 secs my delta 1.00 secs.
[1210 16:30:27.639868] 0xa000d000 (Debug) FOUsurpCheck: ARB is already mine.
[1210 16:30:27] 0xa000d000 (Info) Branding Arbitration Block (attempt 1) votes 0.
[1210 16:30:29.641107] 0xa000d000 (Debug) Cannot find fail over script [/Library/Filesystems/Xsan/bin/cvfail.thalamus.XXXXXXXXX.XXXX.edu] - looking for generic script.
[1210 16:30:29] 0xa000d000 (Info) Launching fail over script [/Library/Filesystems/Xsan/bin/cvfail thalamus.XXXXXXXXX.XXXX.edu 49561 GAZZXSAN]
[1210 16:30:31.530141] 0xa000d000 (Debug) Starting journal log recovery.
[1210 16:30:31.692316] 0xa000d000 (Debug) Completed journal log recovery.
[1210 16:30:31.692676] 0xa000d000 (Debug) Inodeinit_postactivation: FsStatus 0x103, Brl_ResyncState 1
[1210 16:30:31] 0x1887c00 (Info) FSM Alloc: Loading Stripe Group "hippo2". 2.05 TB.
[1210 16:30:31] 0x188a400 (Info) FSM Alloc: Loading Stripe Group "hippo1". 2.73 TB.
[1210 16:30:31] 0x1890200 (Info) FSM Alloc: Loading Stripe Group "hippo3". 1.36 TB.
[1210 16:30:31] 0xa000d000 (*FATAL*) PANIC: /Library/Filesystems/Xsan/bin/fsm "
Stripe Group 3 has an existing configuration name of ""
which does not match the new name of "hippo4". You must
maintain the original stripe group order and/or name in the
configuration or re-initialize the file system.
File system 'GAZZXSAN' not started.
" file alloc.c, line 2853
[1210 16:30:31] 0xa000d000 (*FATAL*) PANIC: wait 3 secs for journal to flush
[1210 16:30:32] 0x1890200 (Info) FSM Alloc: free blocks 18459254 with 1894400 blocks reserved for client delayed buffers.
[1210 16:30:32] 0x1890200 (Info) FSM Alloc: reserved blocks are product of MaxConnections(75) and MaxMBPerClientReserve(100).
[1210 16:30:32] 0x1890200 (Info) FSM Alloc: Stripe Group "hippo3" active.
[1210 16:30:32] 0x1887c00 (Info) FSM Alloc: free blocks 270479 with 1894400 blocks reserved for client delayed buffers.
[1210 16:30:32] 0x1887c00 (Info) FSM Alloc: reserved blocks are product of MaxConnections(75) and MaxMBPerClientReserve(100).
[1210 16:30:32] 0x1887c00 (Info) FSM Alloc: Stripe Group "hippo2" active.
[1210 16:30:33] 0x188a400 (Info) FSM Alloc: free blocks 3659981 with 1894400 blocks reserved for client delayed buffers.
[1210 16:30:33] 0x188a400 (Info) FSM Alloc: reserved blocks are product of MaxConnections(75) and MaxMBPerClientReserve(100).
[1210 16:30:33] 0x188a400 (Info) FSM Alloc: Stripe Group "hippo1" active.
[1210 16:30:34] 0xa000d000 (*FATAL*) PANIC: aborting threads now.
[1210 16:33:13.611232] 0x1801000 (Debug) sigwait handler starting
[1210 16:33:13] 0xa000d000 (Info) Server Revision 2.7.201 Build 7.40 Built for Darwin 8.0 Created on Thu Oct 11 19:05:39 PDT 2007
[1210 16:33:13] 0xa000d000 (Info)
Configuration:
DiskTypes-4
Disks-4
StripeGroups-4
ForceStripeAlignment-1
MaxConnections-75
ThreadPoolSize-128
StripeAlignSize-256
FsBlockSize-4096
BufferCacheSize-32M
InodeCacheSize-8192
RestoreJournal-Disabled
RestoreJournalDir-None
[1210 16:33:13] 0xa000d000 (Info) Self (thalamus.XXXXXXXXX.XXXX.edu) IP address is XXX.XXX.XX.XX .
[1210 16:33:13.661322] 0xa000d000 (Debug) No fsports file - port range enforcement disabled.
[1210 16:33:13] 0xa000d000 (Info) Listening on TCP socket thalamus.XXXXXXXXX.XXXX.edu:49176
[1210 16:33:13] 0xa000d000 (Info) Node [0] [thalamus.XXXXXXXXX.X:49176] File System Manager Login.
[1210 16:33:13] 0xa000d000 (Info) Service standing by on host 'thalamus.XXXXXXXXX.XXXX.edu:49176'.
[1210 16:33:21.340955] 0xa000d000 (Debug) FOUsurpCheck: read ARB info (pass 1): host (169.230.98.30:49561) conns 0 age 1228955434.00 secs his delta 0.00 secs my delta 7.00 secs.
[1210 16:33:21.340976] 0xa000d000 (Debug) FOUsurpCheck: ARB is already mine.
[1210 16:33:21] 0xa000d000 (Info) Branding Arbitration Block (attempt 1) votes 0.
[1210 16:33:23.342219] 0xa000d000 (Debug) Cannot find fail over script [/Library/Filesystems/Xsan/bin/cvfail.thalamus.XXXXXXXXX.XXXX.edu] - looking for generic script.
[1210 16:33:23] 0xa000d000 (Info) Launching fail over script [/Library/Filesystems/Xsan/bin/cvfail thalamus.XXXXXXXXX.XXXX.edu 49176 GAZZXSAN]
[1210 16:33:25.352876] 0xa000d000 (Debug) Starting journal log recovery.
[1210 16:33:25.526115] 0xa000d000 (Debug) Completed journal log recovery.
[1210 16:33:25.526474] 0xa000d000 (Debug) Inodeinit_postactivation: FsStatus 0x103, Brl_ResyncState 1
[1210 16:33:25] 0x1887c00 (Info) FSM Alloc: Loading Stripe Group "hippo2". 2.05 TB.
[1210 16:33:25] 0x188a400 (Info) FSM Alloc: Loading Stripe Group "hippo1". 2.73 TB.
[1210 16:33:25] 0x1890200 (Info) FSM Alloc: Loading Stripe Group "hippo3". 1.36 TB.
[1210 16:33:25] 0xa000d000 (*FATAL*) PANIC: /Library/Filesystems/Xsan/bin/fsm "
Stripe Group 3 has an existing configuration name of ""
which does not match the new name of "hippo4". You must
maintain the original stripe group order and/or name in the
configuration or re-initialize the file system.
File system 'GAZZXSAN' not started.
" file alloc.c, line 2853
[1210 16:33:25] 0xa000d000 (*FATAL*) PANIC: wait 3 secs for journal to flush
[1210 16:33:26] 0x1890200 (Info) FSM Alloc: free blocks 18459254 with 1894400 blocks reserved for client delayed buffers.
[1210 16:33:26] 0x1890200 (Info) FSM Alloc: reserved blocks are product of MaxConnections(75) and MaxMBPerClientReserve(100).
[1210 16:33:26] 0x1890200 (Info) FSM Alloc: Stripe Group "hippo3" active.
[1210 16:33:26] 0x1887c00 (Info) FSM Alloc: free blocks 270479 with 1894400 blocks reserved for client delayed buffers.
[1210 16:33:26] 0x1887c00 (Info) FSM Alloc: reserved blocks are product of MaxConnections(75) and MaxMBPerClientReserve(100).
[1210 16:33:26] 0x1887c00 (Info) FSM Alloc: Stripe Group "hippo2" active.
[1210 16:33:27] 0x188a400 (Info) FSM Alloc: free blocks 3659981 with 1894400 blocks reserved for client delayed buffers.
[1210 16:33:27] 0x188a400 (Info) FSM Alloc: reserved blocks are product of MaxConnections(75) and MaxMBPerClientReserve(100).
[1210 16:33:27] 0x188a400 (Info) FSM Alloc: Stripe Group "hippo1" active.
[1210 16:33:28] 0xa000d000 (*FATAL*) PANIC: aborting threads now.
[1210 16:33:46.489924] 0x1801000 (Debug) sigwait handler starting
[1210 16:33:46] 0xa000d000 (Info) Server Revision 2.7.201 Build 7.40 Built for Darwin 8.0 Created on Thu Oct 11 19:05:39 PDT 2007
[1210 16:33:46] 0xa000d000 (Info)
Configuration:
DiskTypes-4
Disks-4
StripeGroups-4
ForceStripeAlignment-1
MaxConnections-75
ThreadPoolSize-128
StripeAlignSize-256
FsBlockSize-4096
BufferCacheSize-32M
InodeCacheSize-8192
RestoreJournal-Disabled
RestoreJournalDir-None
[1210 16:33:46] 0xa000d000 (Info) Self (thalamus.XXXXXXXXX.XXXX.edu) IP address is XXX.XXX.XX.XX .
[1210 16:33:46.494892] 0xa000d000 (Debug) No fsports file - port range enforcement disabled.
[1210 16:33:46] 0xa000d000 (Info) Listening on TCP socket thalamus.XXXXXXXXX.XXXX.edu:49196
[1210 16:33:46] 0xa000d000 (Info) Node [0] [thalamus.XXXXXXXXX.X:49196] File System Manager Login.
[1210 16:33:46] 0xa000d000 (Info) Service standing by on host 'thalamus.XXXXXXXXX.XXXX.edu:49196'.
[1210 16:34:01.434123] 0xa000d000 (Debug) FOUsurpCheck: read ARB info (pass 1): host (169.230.98.30:49176) conns 0 age 1228955608.00 secs his delta 0.00 secs my delta 14.00 secs.
[1210 16:34:01.434145] 0xa000d000 (Debug) FOUsurpCheck: ARB is already mine.
[1210 16:34:01] 0xa000d000 (Info) Branding Arbitration Block (attempt 1) votes 0.
[1210 16:34:03.435311] 0xa000d000 (Debug) Cannot find fail over script [/Library/Filesystems/Xsan/bin/cvfail.thalamus.XXXXXXXXX.XXXX.edu] - looking for generic script.
[1210 16:34:03] 0xa000d000 (Info) Launching fail over script [/Library/Filesystems/Xsan/bin/cvfail thalamus.XXXXXXXXX.XXXX.edu 49196 GAZZXSAN]
[1210 16:34:05.320507] 0xa000d000 (Debug) Starting journal log recovery.
[1210 16:34:05.497146] 0xa000d000 (Debug) Completed journal log recovery.
[1210 16:34:05.497510] 0xa000d000 (Debug) Inodeinit_postactivation: FsStatus 0x103, Brl_ResyncState 1
[1210 16:34:05] 0x1887c00 (Info) FSM Alloc: Loading Stripe Group "hippo2". 2.05 TB.
[1210 16:34:05] 0x188a400 (Info) FSM Alloc: Loading Stripe Group "hippo1". 2.73 TB.
[1210 16:34:05] 0x1890200 (Info) FSM Alloc: Loading Stripe Group "hippo3". 1.36 TB.
[1210 16:34:05] 0xa000d000 (*FATAL*) PANIC: /Library/Filesystems/Xsan/bin/fsm "
Stripe Group 3 has an existing configuration name of ""
which does not match the new name of "hippo4". You must
maintain the original stripe group order and/or name in the
configuration or re-initialize the file system.
File system 'GAZZXSAN' not started.
" file alloc.c, line 2853
[1210 16:34:05] 0xa000d000 (*FATAL*) PANIC: wait 3 secs for journal to flush
[1210 16:34:06] 0x1890200 (Info) FSM Alloc: free blocks 18459254 with 1894400 blocks reserved for client delayed buffers.
[1210 16:34:06] 0x1890200 (Info) FSM Alloc: reserved blocks are product of MaxConnections(75) and MaxMBPerClientReserve(100).
[1210 16:34:06] 0x1890200 (Info) FSM Alloc: Stripe Group "hippo3" active.
[1210 16:34:06] 0x1887c00 (Info) FSM Alloc: free blocks 270479 with 1894400 blocks reserved for client delayed buffers.
[1210 16:34:06] 0x1887c00 (Info) FSM Alloc: reserved blocks are product of MaxConnections(75) and MaxMBPerClientReserve(100).
[1210 16:34:06] 0x1887c00 (Info) FSM Alloc: Stripe Group "hippo2" active.
[1210 16:34:07] 0x188a400 (Info) FSM Alloc: free blocks 3659981 with 1894400 blocks reserved for client delayed buffers.
[1210 16:34:07] 0x188a400 (Info) FSM Alloc: reserved blocks are product of MaxConnections(75) and MaxMBPerClientReserve(100).
[1210 16:34:07] 0x188a400 (Info) FSM Alloc: Stripe Group "hippo1" active.
[1210 16:34:08] 0xa000d000 (*FATAL*) PANIC: aborting threads now.
xserve/xserveRAID, Mac OS X (10.4.11), xsan1.4 (server 10.4.11)
  • Donald Kok Calculating status...
    Hi,
    Obviously there is a problem with hippo4. Part of xsan thinks it is "", while other parts think it is "hippo4".

    Let us first make sure all luns are available with: cvlabel -ls
    This should give you all luns. Check this with the config file. If this is not the case, there is a fibre channel problem (zoning?)

    Second, check the config file if it looks like you expect. There should be 4 stripe groups defined, each with the right lun.

    If all above is well, you can do check cvadmin -> show long. It gives info about the storage pools. I expect trouble here. hopefully just a missing hippo4.

    Maybe the updating of the filesystem went wrong, and you need to do a cvupdatefs. Check the manpage before you do it, since it realy will change your filesystem. cvupdatefs is verbose, so you do not have to be that scared.
    Mac OS X (10.5.2)
  • Donald Kok Level 2 Level 2 (490 points)
    What does cvupdate fs show you?
    And what shows cvadmin -> show long?

    removing the lines from the config does not remove them from the internal xsan system. changes in a config are deployed through a cvupdatefs. (The GUI will do this too.) a cvfsck will not do this.

    The above mentioned commands will show you the status of your xsan, and lead you to integrating the vtrak.
    Mac OS X (10.5.2)

Actions

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • This solved my question - 10 points
  • This helped me - 5 points
This site contains user submitted content, comments and opinions and is for informational purposes only. Apple disclaims any and all liability for the acts, omissions and conduct of any third parties in connection with or related to your use of the site. All postings and use of the content on this site are subject to the Apple Support Communities Terms of Use.