Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Xsan FATAL error -- can someone help explain?

I have an Xsan system running on a 5TB Xserve RAID that I use for video editing only. I have two workstations, one a MacPro and one a PowerMac G5. I have a G5Xserve as the main controller and the G5 workstation as a backup controller. All are running Xsan 1.4.1 on OSX 10.4.10.

Everything has been running relatively smoothly until one day I noticed a 'volume not mounted' dialog for my Xsan volume on one of the workstations. The volume was there and I could read and write to it. But.. when I looked at the main controller's Volume log I see all kinds of activity that I don't understand, including a FATAL error which, of course, makes me nervous.

Can anyone decipher from the log file (below) what might be wrong? ..sorry for the long post

Thanks. John

--

[1127 10:56:09.360360] 0xa000ed88 (Debug) FSM msg queue SUMMARY events now/0 min/0 max/0.
[1127 10:56:09.360458] 0xa000ed88 (Debug) FSM cache SUMMARY inode lookups/315 misses/55 hits/82.54%.
[1127 10:56:09.360475] 0xa000ed88 (Debug) FSM cache SUMMARY free incore inodes now/8192 min/8188 max/8192.
[1127 10:56:09.360493] 0xa000ed88 (Debug) FSM cache SUMMARY buffer lookups/57380 misses/48 hits/99.92%.
[1127 10:56:09.360507] 0xa000ed88 (Debug) FSM cache SUMMARY free buffers now/8192 min/8186 max/8192.
[1127 10:56:09.360521] 0xa000ed88 (Debug) FSM cache SUMMARY attrs now/18 min/1 max/18.
[1127 10:56:09.360537] 0xa000ed88 (Debug) FSM extent SUMMARY extent lookups/26 misses/0 hits/100.00%.
[1127 10:56:09.360552] 0xa000ed88 (Debug) FSM extent SUMMARY hint tries/0 misses/0 hits/0.00%.
[1127 10:56:09.360571] 0xa000ed88 (Debug) VOP SUMMARY Setattr cnt/88 avg/73 min/42 max/192.
[1127 10:56:09.360588] 0xa000ed88 (Debug) VOP SUMMARY Open cnt/31 avg/54 min/25 max/382.
[1127 10:56:09.360605] 0xa000ed88 (Debug) VOP SUMMARY Close cnt/31 avg/34 min/22 max/95.
[1127 10:56:09.360623] 0xa000ed88 (Debug) VOP SUMMARY VopLookupV4 cnt/19 avg/3963 min/32 max/19580.
[1127 10:56:09.360642] 0xa000ed88 (Debug) TKN SUMMARY TokenChangeV4 cnt/88 avg/10 min/5 max/18.
[1127 10:56:09.360665] 0xa000ed88 (Debug) BRL SUMMARY ClearLock cnt/291 avg/12 min/8 max/49.
[1127 10:56:09.360687] 0xa000ed88 (Debug) SG SUMMARY ALAB Pool1 space total/4.00 TB free/3.34 TB (83.41%)
[1127 10:56:09.360707] 0xa000ed88 (Debug) SG SUMMARY ALAB Pool1 space minfree/3.34 TB (83.41%) maxfree/3.34 TB (83.41%)
[1127 10:56:09.360722] 0xa000ed88 (Debug) SG SUMMARY ALAB Pool1 alloc extent cnt/0 avgsize/0.00 B.
[1127 10:56:09.360739] 0xa000ed88 (Debug) PIO Read SUMMARY LUN1 cnt/5715 maxq/2.
[1127 10:56:09.360756] 0xa000ed88 (Debug) PIO Read SUMMARY LUN1 avg/629 min/283 max/48356.
[1127 10:56:09.360772] 0xa000ed88 (Debug) PIO Read SUMMARY LUN1 sysavg/470 sysmin/266 sysmax/36929.
[1127 10:56:09.360788] 0xa000ed88 (Debug) PIO Read SUMMARY LUN1 avglen/515 minlen/512 maxlen/4096.
[1127 10:56:09.360803] 0xa000ed88 (Debug) PIO Write SUMMARY LUN1 cnt/5712 maxq/1.
[1127 10:56:09.360819] 0xa000ed88 (Debug) PIO Write SUMMARY LUN1 avg/415 min/258 max/27271.
[1127 10:56:09.360835] 0xa000ed88 (Debug) PIO Write SUMMARY LUN1 sysavg/349 sysmin/243 sysmax/17260.
[1127 10:56:09.360851] 0xa000ed88 (Debug) PIO Write SUMMARY LUN1 avglen/513 minlen/512 maxlen/4096.
[1127 10:56:09.360867] 0xa000ed88 (Debug) PIO HiPriWr SUMMARY LUN2 cnt/69 maxq/0.
[1127 10:56:09.360883] 0xa000ed88 (Debug) PIO HiPriWr SUMMARY LUN2 avg/405 min/353 max/1038.
[1127 10:56:09.360899] 0xa000ed88 (Debug) PIO HiPriWr SUMMARY LUN2 sysavg/403 sysmin/351 sysmax/1032.
[1127 10:56:09.360915] 0xa000ed88 (Debug) PIO HiPriWr SUMMARY LUN2 avglen/1164 minlen/512 maxlen/2560.
[1127 10:56:09.360930] 0xa000ed88 (Debug) PIO Read SUMMARY LUN2 cnt/43 maxq/3.
[1127 10:56:09.360946] 0xa000ed88 (Debug) PIO Read SUMMARY LUN2 avg/4061 min/494 max/29204.
[1127 10:56:09.360962] 0xa000ed88 (Debug) PIO Read SUMMARY LUN2 sysavg/2314 sysmin/292 sysmax/18951.
[1127 10:56:09.360978] 0xa000ed88 (Debug) PIO Read SUMMARY LUN2 avglen/4096 minlen/4096 maxlen/4096.
[1127 10:56:09.360993] 0xa000ed88 (Debug) PIO Write SUMMARY LUN2 cnt/79 maxq/1.
[1127 10:56:09.361009] 0xa000ed88 (Debug) PIO Write SUMMARY LUN2 avg/381 min/291 max/2132.
[1127 10:56:09.361025] 0xa000ed88 (Debug) PIO Write SUMMARY LUN2 sysavg/363 sysmin/278 sysmax/2116.
[1127 10:56:09.361041] 0xa000ed88 (Debug) PIO Write SUMMARY LUN2 avglen/4096 minlen/4096 maxlen/4096.
[1127 10:56:09.361056] 0xa000ed88 (Debug) Cache SUMMARY [10.10.1.101] attrs now/18 min/1 max/18.
[1127 10:56:09.361075] 0xa000ed88 (Debug) VOP SUMMARY [10.10.1.101] Setattr cnt/88 avg/73 min/42 max/192.
[1127 10:56:09.361092] 0xa000ed88 (Debug) VOP SUMMARY [10.10.1.101] Open cnt/31 avg/54 min/25 max/382.
[1127 10:56:09.361109] 0xa000ed88 (Debug) VOP SUMMARY [10.10.1.101] Close cnt/31 avg/34 min/22 max/95.
[1127 10:56:09.361126] 0xa000ed88 (Debug) VOP SUMMARY [10.10.1.101] VopLookupV4 cnt/19 avg/3963 min/32 max/19580.
[1127 10:56:09.361150] 0xa000ed88 (Debug) TKN SUMMARY [10.10.1.101] TokenChangeV4 cnt/88 avg/10 min/5 max/18.
[1127 10:56:09.361200] 0xa000ed88 (Debug) BRL SUMMARY [10.10.1.101] ClearLock cnt/291 avg/12 min/8 max/49.
[1127 10:56:09] 0x1816200 ( *FATAL*) PANIC: aborting threads now.
[1127 10:56:24.711011] 0x1801000 (Debug) sigwait handler starting
[1127 10:56:24] 0xa000ed88 (Info) Server Revision 2.7.201 Build 7.23 Built for Darwin 8.0 Created on Mon Nov 13 11:49:56 PST 2006
[1127 10:56:24] 0xa000ed88 (Info)
Configuration:
DiskTypes-2
Disks-2
StripeGroups-1
ForceStripeAlignment-1
MaxConnections-75
ThreadPoolSize-16
StripeAlignSize-256
FsBlockSize-4096
BufferCacheSize-32M
InodeCacheSize-8192
RestoreJournal-Disabled
RestoreJournalDir-None
[1127 10:56:24] 0xa000ed88 (Info) Self (10.10.1.100) IP address is 10.10.1.100 .
[1127 10:56:24.717473] 0xa000ed88 (Debug) No fsports file - port range enforcement disabled.
[1127 10:56:24] 0xa000ed88 (Info) Listening on TCP socket 10.10.1.100:55256
[1127 10:56:24] 0xa000ed88 (Info) Node [0] [10.10.1.100:55256] File System Manager Login.
[1127 10:56:24] 0xa000ed88 (Info) Service standing by on host '10.10.1.100:55256'.
[1127 10:56:26.969638] 0xa000ed88 (Debug) FOUsurpCheck: read ARB info (pass 1): host (10.10.1.100:53307) conns 1 age 1196182568.00 secs his delta 0.00 secs my delta 2.00 secs.
[1127 10:56:26.969686] 0xa000ed88 (Debug) FOUsurpCheck: ARB is already mine.
[1127 10:56:26] 0xa000ed88 (Info) Branding Arbitration Block (attempt 1) votes 0.
[1127 10:56:28.971402] 0xa000ed88 (Debug) Cannot find fail over script [/Library/Filesystems/Xsan/bin/cvfail.10.10.1.100] - looking for generic script.
[1127 10:56:28] 0xa000ed88 (Info) Launching fail over script [/Library/Filesystems/Xsan/bin/cvfail 10.10.1.100 55256 Napoleon]
[1127 10:56:29.035049] 0xa000ed88 (Debug) Starting journal log recovery.
[1127 10:56:29.093899] 0xa000ed88 (Debug) Completed journal log recovery.
[1127 10:56:29.094372] 0xa000ed88 (Debug) Inode init_postactivation: FsStatus 0x103, Brl_ResyncState 1
[1127 10:56:29] 0x1819600 (Info) FSM Alloc: Loading Stripe Group "ALAB Pool1". 4.00 TB.
[1127 10:56:29] 0x1819600 (Info) FSM Alloc: restricted area starts at block 0x3ffffe00
.[1127 10:56:32] 0x1819600 (Info) FSM Alloc: free blocks 895595175 with 1894400 blocks reserved for client delayed buffers.
[1127 10:56:32] 0x1819600 (Info) FSM Alloc: reserved blocks are product of MaxConnections(75) and MaxMBPerClientReserve(100).
[1127 10:56:32] 0x1819600 (Info) FSM Alloc: Stripe Group "ALAB Pool1" active.
[1127 10:56:32] 0xa000ed88 (Info) File System Service 'Napoleon[0]' now active on host '10.10.1.100:55256'.
[1127 10:56:32.225325] 0xa000ed88 (Debug) Node [1] [10.10.1.100:55266] connected.
[1127 10:56:32.234088] 0xa000ed88 (Debug) Node [2] [10.10.1.101:57060] connected.
[1127 10:56:32.238097] 0xa000ed88 (Debug) Node [3] [10.10.1.102:64843] connected.
[1127 10:56:32] 0x180f200 (Info) Node [2] [10.10.1.101:57060] Client Login (active 1).
[1127 10:56:32] 0x180f600 (Info) Node [3] [10.10.1.102:64843] Client Login (active 2).
[1127 10:56:32] 0x1a60e00 (Info) Node [1] [10.10.1.100:55266] Administrator Logout.
[1127 10:56:32.359890] 0xa000ed88 (Debug) Node [4] [10.10.1.100:55270] connected.
[1127 10:56:32] 0x1819600 (Info) Node [4] [10.10.1.100:55270] Administrator Logout.
[1127 10:56:39.061570] 0xa000ed88 (Debug) Node [5] [10.10.1.100:55284] connected.
[1127 10:56:39] 0x1a60e00 (Info) Node [5] [10.10.1.100:55284] Administrator Logout.
[1127 10:56:51.002075] 0xa000ed88 (Debug) Node [6] [10.10.1.100:55298] connected.
[1127 10:56:51] 0x1819600 (Info) Node [6] [10.10.1.100:55298] Administrator Logout.
[1127 10:57:15.018438] 0xa000ed88 (Debug) Node [7] [10.10.1.100:55313] connected.
[1127 10:57:15] 0x1a62000 (Info) Node [7] [10.10.1.100:55313] Administrator Logout.
[1127 10:58:02.687483] 0xa000ed88 (Debug) Node [8] [10.10.1.100:55327] connected.
[1127 10:58:02] 0x1a64600 (Info) Node [8] [10.10.1.100:55327] Administrator Logout.
[1127 10:59:13.715436] 0xa000ed88 (Debug) Node [9] [10.10.1.100:55348] connected.
[1127 10:59:13] 0x1a61e00 (Info) Node [9] [10.10.1.100:55348] Administrator Logout.
[1127 10:59:13.782990] 0xa000ed88 (Debug) Node [10] [10.10.1.100:55350] connected.
[1127 10:59:13] 0x1a64600 (Info) Node [10] [10.10.1.100:55350] Administrator Logout.
[1127 10:59:42.107946] 0xa000ed88 (Debug) Node [11] [10.10.1.100:55363] connected.
[1127 10:59:42] 0x1a61e00 (Info) Node [11] [10.10.1.100:55363] Administrator Logout.
[1127 10:59:58.027790] 0x1a63a00 (Debug) Cache SUMMARY [10.10.1.102] attrs now/9471 min/1 max/9471.
[1127 10:59:58.027892] 0x1a63a00 (Debug) VOP SUMMARY [10.10.1.102] Setattr cnt/10 avg/78 min/41 max/252.
[1127 10:59:58.027911] 0x1a63a00 (Debug) VOP SUMMARY [10.10.1.102] Open cnt/3 avg/78 min/32 max/147.
[1127 10:59:58.027929] 0x1a63a00 (Debug) VOP SUMMARY [10.10.1.102] Close cnt/2 avg/110 min/39 max/182.
[1127 10:59:58.027947] 0x1a63a00 (Debug) VOP SUMMARY [10.10.1.102] VopGetattrV4 cnt/9475 avg/198 min/14 max/24665.
[1127 10:59:58.027964] 0x1a63a00 (Debug) VOP SUMMARY [10.10.1.102] VopLookupV4 cnt/21 avg/33 min/15 max/68.
[1127 10:59:58.027981] 0x1a63a00 (Debug) VOP SUMMARY [10.10.1.102] VopCapNegotiate cnt/1 avg/13 min/13 max/13.
[1127 10:59:58.028006] 0x1a63a00 (Debug) TKN SUMMARY [10.10.1.102] TokenRequestV2 cnt/1 avg/21 min/21 max/21.
[1127 10:59:58.028023] 0x1a63a00 (Debug) TKN SUMMARY [10.10.1.102] TokenChangeV4 cnt/10 avg/21 min/17 max/28.
[1127 10:59:58.028050] 0x1a63a00 (Debug) BRL SUMMARY [10.10.1.102] ClearLock cnt/7 avg/11 min/9 max/16.
[1127 10:59:58.028067] 0x1a63a00 (Debug) BRL SUMMARY [10.10.1.102] ReconDone cnt/1 avg/49 min/49 max/49.
[1127 10:59:58] 0x1a63a00 (Info) Node [3] [10.10.1.102:64843] Client Logout (active 1).
[1127 10:59:59.896054] 0xa000ed88 (Debug) Node [12] [10.10.1.100:55379] connected.
[1127 10:59:59] 0x1819600 (Info) Node [12] [10.10.1.100:55379] Administrator Logout.
[1127 11:01:09.915243] 0xa000ed88 (Debug) Node [13] [10.10.1.100:55389] connected.
[1127 11:01:09] 0x1a62800 (Info) Node [13] [10.10.1.100:55389] Administrator Logout.
[1127 11:01:09.934890] 0xa000ed88 (Debug) Node [14] [10.10.1.100:55391] connected.
[1127 11:01:09] 0x1819600 (Info) Node [14] [10.10.1.100:55391] Administrator Logout.
[1127 11:02:11.292418] 0xa000ed88 (Debug) Node [15] [10.10.1.100:55399] connected.
[1127 11:02:11] 0x1a62000 (Info) Node [15] [10.10.1.100:55399] Administrator Logout.
[1127 11:02:11.312023] 0xa000ed88 (Debug) Node [16] [10.10.1.100:55401] connected.
[1127 11:02:11] 0x1819600 (Info) Node [16] [10.10.1.100:55401] Administrator Logout.

MacPro Quad Core, Mac OS X (10.4.10), Xsan, Xserve RAID

Posted on Nov 27, 2007 10:01 AM

Reply
3 replies

Nov 27, 2007 10:41 AM in response to jchristensen

Hi,
xsan is critical about time. From the log:
FOUsurpCheck: read ARB info (pass 1): host (10.10.1.100:53307) conns 1 age 1196182568.00 secs his delta 0.00 secs my delta 2.00 secs.

So there are two seconds between the two MDC's. It could be the reason why the volumes failed over.

The failover might be the reason of the umount on the client.

So the hint is to use a (the same) time server on all the hosts.

Hope this helps
Donald

Nov 27, 2007 4:34 PM in response to Donald Kok

Donald.. Thank much. I checked all three of my machines, and sure enough, the one that keeps displaying a 'disconnected' dialog was not updating time from a time server at all; the other two machines (the main controller and one client) were. So that was probably the disconnect issue.

But.. I am still wondering: is it normal for my log file to be filled with so much activity? For example, what are all the '(Debug)... SUMMARY' entries for? And I still don't understand what the ' *FATAL*' error was? Was that really just a timing issue? Fatal sounds pretty serious.

John

Xsan FATAL error -- can someone help explain?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.