Q: SlS 10.6.7, XSAN 2.2.1, suddenly license became invalid!!!
Last week we suffered a double problem on our XSERVE (installed two weeks ago!)
XSAN disconnected volumes telling that the license was invalid AND the system hard disk had a complete messed up with the permissions
At first we tried to repair permissions, but after 3 hours it was hang.
So we re-installed SnowLeoS 10.6.2 to 10.6.7,importing old profiles.
The Xsan admin started but with the same problem on license. The system seemed to be a bit unstable, so we placed another HD and reinstalled everything from zero.
Again, SLS 10.6.7 and XSAN 2.2.1: XSAN admin starts correctly, I see Luns, Volumes, the serial seems valid..
But as soon as I try to start the first vol I have it started as a ghost.. not mounted, unusable
If I try to start the second Vol it waits minutes and then he doesn't succeed in starting and I see even the apparently started vol now stopped and again the error in the serial number:
Note that I tried even to get another (KCNcrew) serial, but the behaviour is the same!
PLEASE HELP
--------------------
Log fsmpm:
.1.10:61948
[0622 00:32:33] 0x10024fca0 (debug) Elect FSS[0] 10.0.0.2:65129 svc_pri/0x0 votes/1 wght/0x50f5a1b80
[0622 00:32:33] 0x10024fca0 (debug) Elect FSS winner so far 10.0.0.2:65129 svc_pri/00 votes/1 wght/0x50f5a1b80
[0622 00:32:33] 0x10024fca0 INFO NSS: Vote selected FSS 'Archivio[0]' at 10.0.0.2:65129 (pid 12047) - attempting activation.
[0622 00:32:33] 0x10024fca0 (debug) set_fss_active: sending to 10.0.0.2:61948 (id 192.168.1.10)
[0622 00:32:33] 0x10024fca0 INFO NSS: Vote call for FSS Archivio is inhibited - vote dis-allowed.
[0622 00:32:33] 0x10024fca0 (debug) NSS: FSS activation initiated by coordinator 10.0.0.2:61948 (id 192.168.1.10) votes 1
[0622 00:32:46] 0x10024fca0 (debug) NSS: New mount registered for 'Archivio'.
[0622 00:32:46] 0x10024fca0 (debug) find_fsm fsm Archivio ipaddr 10.0.0.2 port 65129 TestLink failed: getsockopt(SO_ERROR) returned error 61 [errno 61]: Connection refused
[0622 00:32:46] 0x10024fca0 (debug) NSS: FSS mount list for client 10.0.0.2 (id 192.168.1.10) - Archivio
[0622 00:32:53] 0x10024fca0 NOTICE PortMapper: FSS 'Archivio' disconnected.
[0622 00:32:53] 0x10024fca0 NOTICE PortMapper: kicking diskscan_thread 4326440960.
[0622 00:32:53] 0x10024fca0 (debug) FSS 'Archivio' REGISTERED -> DYING, next event in 60s
[0622 00:32:53] 0x101e04000 INFO Starting Disk rescan
[0622 00:32:53] 0x101e04000 INFO Disk rescan delay completed
[0622 00:32:53] 0x101e04000 INFO Disk rescan found 4 disks
[0622 00:32:54] 0x101e87000 ERR Portmapper: FSS 'Archivio' (pid 12047) exited on signal 6
[0622 00:32:54] 0x101e87000 (debug) FSS 'Archivio' DYING -> RELAUNCH, next event in 10s
[0622 00:32:54] 0x10024fca0 INFO NSS: Active FSS 'Archivio[0]' at 192.168.1.10:65129 (pid 12047) - dropped.
[0622 00:32:54] 0x10024fca0 (debug) NSS: removing vote inhibitor for FSS 'Archivio'.
[0622 00:33:03] 0x101e87000 (debug) FSS 'Archivio' RELAUNCH -> LAUNCHED, next event in 60s
[0622 00:33:03] 0x101e87000 NOTICE PortMapper: RESTART FSS service 'Archivio[0]' on host 192.168.1.10.
[0622 00:33:03] 0x101e87000 NOTICE PortMapper: Starting FSS service 'Archivio[0]' on 192.168.1.10.
[0622 00:33:03] 0x10024fca0 (debug) FSS 'Archivio' LAUNCHED -> REGISTERED
[0622 00:33:03] 0x10024fca0 NOTICE PortMapper: FSS 'Archivio'[0] (pid 12070) at port 65169 is registered.
[0622 00:33:04] 0x10024fca0 INFO NSS: Standby FSS 'Archivio[0]' at id 192.168.1.10 port 65169 (pid 12070) - registered.
[0622 00:33:04] 0x10024fca0 (debug) Heartbeat from ID 192.168.1.10 updating LOCAL Archivio to 10.0.0.2:65169
[0622 00:33:16] 0x10024fca0 NOTICE PortMapper: Initiating activation vote for FSS 'Archivio'.
[0622 00:33:16] 0x10024fca0 (debug) Initiate_nss_vote for FSS Archivio
[0622 00:33:16] 0x10024fca0 (debug) NSS: sending message (type 2) to Name Server '10.0.0.2' (10.0.0.2:61948).
[0622 00:33:16] 0x10024fca0 INFO NSS: election initiated by 10.0.0.2:61948 (id 192.168.1.10) - client request.
[0622 00:33:16] 0x10024fca0 (debug) NSS_VOTE2 to 10.0.0.2:61948
[0622 00:33:16] 0x10024fca0 INFO NSS: Starting vote for FSS Archivio using 1 voting member: 192.168.1.10.
[0622 00:33:16] 0x10024fca0 (debug) Connectivity test[1] to FSS 10.0.0.2:65169 passed .
[0622 00:33:16] 0x10024fca0 (debug) local_fss_vote: WINNER Archivio at 10.0.0.2:65169.
[0622 00:33:16] 0x10024fca0 (debug) local_fss_vote: sending tally to 192.168.1.10:65169
[0622 00:33:16] 0x10024fca0 (debug) tally_member:FSS/Archivio COUNTED 1 vote for 10.0.0.2:65169 from member 192.168.1.10:61948
[0622 00:33:16] 0x10024fca0 (debug) Elect FSS[0] 10.0.0.2:65169 svc_pri/0x0 votes/1 wght/0x511ea3c40
[0622 00:33:16] 0x10024fca0 (debug) Elect FSS winner so far 10.0.0.2:65169 svc_pri/00 votes/1 wght/0x511ea3c40
[0622 00:33:16] 0x10024fca0 INFO NSS: Vote selected FSS 'Archivio[0]' at 10.0.0.2:65169 (pid 12070) - attempting activation.
[0622 00:33:16] 0x10024fca0 (debug) set_fss_active: sending to 10.0.0.2:61948 (id 192.168.1.10)
[0622 00:33:16] 0x10024fca0 (debug) NSS: FSS activation initiated by coordinator 10.0.0.2:61948 (id 192.168.1.10) votes 1
[0622 00:33:18] 0x10024fca0 (debug) find_fsm fsm Archivio ipaddr 10.0.0.2 port 65169 TestLink failed: getsockopt(SO_ERROR) returned error 61 [errno 61]: Connection refused
[0622 00:33:20] 0x10024fca0 (debug) find_fsm fsm Archivio ipaddr 10.0.0.2 port 65169 TestLink failed: getsockopt(SO_ERROR) returned error 61 [errno 61]: Connection refused
[0622 00:33:21] 0x10024fca0 (debug) find_fsm fsm Archivio ipaddr 10.0.0.2 port 65169 TestLink failed: getsockopt(SO_ERROR) returned error 61 [errno 61]: Connection refused
[0622 00:33:22] 0x10024fca0 (debug) find_fsm fsm Archivio ipaddr 10.0.0.2 port 65169 TestLink failed: getsockopt(SO_ERROR) returned error 61 [errno 61]: Connection refused
[0622 00:33:23] 0x10024fca0 (debug) find_fsm fsm Archivio ipaddr 10.0.0.2 port 65169 TestLink failed: getsockopt(SO_ERROR) returned error 61 [errno 61]: Connection refused
[0622 00:33:24] 0x10024fca0 (debug) find_fsm fsm Archivio ipaddr 10.0.0.2 port 65169 TestLink failed: getsockopt(SO_ERROR) returned error 61 [errno 61]: Connection refused
[0622 00:33:25] 0x10024fca0 (debug) find_fsm fsm Archivio ipaddr 10.0.0.2 port 65169 TestLink failed: getsockopt(SO_ERROR) returned error 61 [errno 61]: Connection refused
[0622 00:33:26] 0x10024fca0 (debug) find_fsm fsm Archivio ipaddr 10.0.0.2 port 65169 TestLink failed: getsockopt(SO_ERROR) returned error 61 [errno 61]: Connection refused
[0622 00:33:27] 0x10024fca0 (debug) find_fsm fsm Archivio ipaddr 10.0.0.2 port 65169 TestLink failed: getsockopt(SO_ERROR) returned error 61 [errno 61]: Connection refused
[0622 00:33:29] 0x10024fca0 (debug) find_fsm fsm Archivio ipaddr 10.0.0.2 port 65169 TestLink failed: getsockopt(SO_ERROR) returned error 61 [errno 61]: Connection refused
[0622 00:33:29] 0x10024fca0 NOTICE PortMapper: FSS 'Archivio' disconnected.
[0622 00:33:29] 0x10024fca0 NOTICE PortMapper: kicking diskscan_thread 4326440960.
[0622 00:33:29] 0x10024fca0 (debug) FSS 'Archivio' REGISTERED -> DYING, next event in 60s
[0622 00:33:29] 0x101e04000 INFO Starting Disk rescan
[0622 00:33:29] 0x101e04000 INFO Disk rescan delay completed
[0622 00:33:29] 0x101e04000 INFO Disk rescan found 4 disks
[0622 00:33:29] 0x101e87000 ERR Portmapper: FSS 'Archivio' (pid 12070) exited on signal 6
[0622 00:33:29] 0x101e87000 ERR FSS 'Archivio' appears unstable (2 failures in 60 minutes or less), halting restarts
[0622 00:33:29] 0x101e87000 (debug) FSS 'Archivio' DYING -> STOPPED (corefile limit exceeded)
[0622 00:33:29] 0x10024fca0 INFO NSS: Active FSS 'Archivio[0]' at 192.168.1.10:65169 (pid 12070) - dropped.
[0622 00:33:29] 0x10024fca0 (debug) NSS: removing vote inhibitor for FSS 'Archivio'.
[0622 00:47:25] 0x101e87000 NOTICE PortMapper: Starting FSS service 'Archivio[0]' on 192.168.1.10.
[0622 00:47:25] 0x10024fca0 (debug) FSS 'Archivio' STOPPED (corefile limit exceeded) -> LAUNCHED, next event in 60s
[0622 00:47:25] 0x10024fca0 (debug) FSS 'Archivio' LAUNCHED -> REGISTERED
[0622 00:47:25] 0x10024fca0 NOTICE PortMapper: FSS 'Archivio'[0] (pid 12966) at port 49565 is registered.
[0622 00:47:26] 0x10024fca0 INFO NSS: Standby FSS 'Archivio[0]' at id 192.168.1.10 port 49565 (pid 12966) - registered.
[0622 00:47:26] 0x10024fca0 (debug) Heartbeat from ID 192.168.1.10 updating LOCAL Archivio to 10.0.0.2:49565
[0622 00:47:26] 0x10024fca0 NOTICE PortMapper: Initiating activation vote for FSS 'Archivio'.
[0622 00:47:26] 0x10024fca0 (debug) Initiate_nss_vote for FSS Archivio
[0622 00:47:26] 0x10024fca0 (debug) NSS: sending message (type 2) to Name Server '10.0.0.2' (10.0.0.2:61948).
[0622 00:47:26] 0x10024fca0 INFO NSS: election initiated by 10.0.0.2:61948 (id 192.168.1.10) - admin request.
[0622 00:47:26] 0x10024fca0 (debug) NSS_VOTE2 to 10.0.0.2:61948
[0622 00:47:26] 0x10024fca0 INFO NSS: Starting vote for FSS Archivio using 1 voting member: 192.168.1.10.
[0622 00:47:26] 0x10024fca0 (debug) Connectivity test[1] to FSS 10.0.0.2:49565 passed .
[0622 00:47:26] 0x10024fca0 (debug) local_fss_vote: WINNER Archivio at 10.0.0.2:49565.
[0622 00:47:26] 0x10024fca0 (debug) local_fss_vote: sending tally to 192.168.1.10:49565
[0622 00:47:26] 0x10024fca0 (debug) tally_member:FSS/Archivio COUNTED 1 vote for 10.0.0.2:49565 from member 192.168.1.10:61948
[0622 00:47:26] 0x10024fca0 (debug) Elect FSS[0] 10.0.0.2:49565 svc_pri/0x0 votes/1 wght/0x5448fa0e0
[0622 00:47:26] 0x10024fca0 (debug) Elect FSS winner so far 10.0.0.2:49565 svc_pri/00 votes/1 wght/0x5448fa0e0
[0622 00:47:26] 0x10024fca0 INFO NSS: Vote selected FSS 'Archivio[0]' at 10.0.0.2:49565 (pid 12966) - attempting activation.
[0622 00:47:26] 0x10024fca0 (debug) set_fss_active: sending to 10.0.0.2:61948 (id 192.168.1.10)
[0622 00:47:26] 0x10024fca0 NOTICE PortMapper: Initiating activation vote for FSS 'Archivio'.
[0622 00:47:26] 0x10024fca0 (debug) Initiate_nss_vote for FSS Archivio
[0622 00:47:26] 0x10024fca0 (debug) NSS: sending message (type 2) to Name Server '10.0.0.2' (10.0.0.2:61948).
[0622 00:47:26] 0x10024fca0 (debug) NSS: FSS activation initiated by coordinator 10.0.0.2:61948 (id 192.168.1.10) votes 1
[0622 00:47:26] 0x10024fca0 INFO NSS: Vote call for FSS Archivio is inhibited - vote dis-allowed.
[0622 00:47:31] 0x10024fca0 (debug) find_fsm fsm Archivio ipaddr 10.0.0.2 port 49565 TestLink failed: getsockopt(SO_ERROR) returned error 61 [errno 61]: Connection refused
[0622 00:47:40] 0x10024fca0 NOTICE PortMapper: FSS 'Archivio' disconnected.
[0622 00:47:40] 0x10024fca0 NOTICE PortMapper: kicking diskscan_thread 4326440960.
[0622 00:47:40] 0x10024fca0 (debug) FSS 'Archivio' REGISTERED -> DYING, next event in 60s
[0622 00:47:40] 0x101e04000 INFO Starting Disk rescan
[0622 00:47:40] 0x101e04000 INFO Disk rescan delay completed
[0622 00:47:40] 0x101e04000 INFO Disk rescan found 4 disks
[0622 00:47:40] 0x101e87000 ERR Portmapper: FSS 'Archivio' (pid 12966) exited on signal 6
[0622 00:47:40] 0x101e87000 ERR FSS 'Archivio' appears unstable (2 failures in 60 minutes or less), halting restarts
[0622 00:47:40] 0x101e87000 (debug) FSS 'Archivio' DYING -> STOPPED (corefile limit exceeded)
[0622 00:47:41] 0x10024fca0 INFO NSS: Active FSS 'Archivio[0]' at 192.168.1.10:49565 (pid 12966) - dropped.
[0622 00:47:41] 0x10024fca0 (debug) NSS: removing vote inhibitor for FSS 'Archivio'.
Log for Archivio volume:
[0622 00:32:32] 0x101ba9ca0 (Info) Server Revision 3.5.0 Build 7443 Branch branches_35X (412.3)
[0622 00:32:32] 0x101ba9ca0 (Info) Built for Darwin 10.0 i386
[0622 00:32:32] 0x101ba9ca0 (Info) Created on Mon Dec 7 12:52:39 PST 2009
[0622 00:32:32] 0x101ba9ca0 (Info) Built in /SourceCache/XsanFS/XsanFS-412.3
[0622 00:32:32] 0x101ba9ca0 (Info)
Configuration:
DiskTypes-2
Disks-2
StripeGroups-2
MaxConnections-139
ThreadPoolSize-256
StripeAlignSize-32
FsBlockSize-16384
BufferCacheSize-128M
InodeCacheSize-32768
RestoreJournal-Disabled
RestoreJournalDir-None
[0622 00:32:32] 0x101ba9ca0 (Info) Self (192.168.1.10) IP address is 192.168.1.10.
[0622 00:32:32.149126] 0x101ba9ca0 (Debug) No fsports file - port range enforcement disabled.
[0622 00:32:32] 0x101ba9ca0 (Info) Listening on TCP socket 192.168.1.10:65129
[0622 00:32:32] 0x101ba9ca0 (Info) Node [0] [192.168.1.10:65129] File System Manager Login.
[0622 00:32:32] 0x101ba9ca0 (Info) ForceStripeAlignment is enabled.
[0622 00:32:32] 0x101ba9ca0 (Info) Service standing by on host '192.168.1.10:65129'.
[0622 00:32:33.064187] 0x101ba9ca0 (Debug) Standby service - NSS ping from 10.0.0.2:65134.
[0622 00:32:33.064216] 0x101ba9ca0 (Debug) Vote count is 1
[0622 00:32:33.064450] 0x101ba9ca0 (Debug) FOUsurpCheck: read ARB info (pass 1): host (192.168.1.10:50315) conns 999999 age 1308665850.00 secs his delta 0.00 secs my delta 0.00 secs.
[0622 00:32:33.064457] 0x101ba9ca0 (Debug) FOUsurpCheck: polling ARB block to check for active peer (pass 1).
[0622 00:32:34.064766] 0x101ba9ca0 (Debug) FOUsurpCheck: read ARB info (pass 2): host (192.168.1.10:50315) conns 999999 age 1308665850.00 secs his delta 0.00 secs my delta 1.00 secs.
[0622 00:32:34.064785] 0x101ba9ca0 (Debug) FOUsurpCheck: ARB is already mine.
[0622 00:32:34] 0x101ba9ca0 (Info) Branding Arbitration Block (attempt 1) votes 1.
[0622 00:32:36.066129] 0x101ba9ca0 (Debug) Cannot find fail over script [/Library/Filesystems/Xsan/bin/cvfail.192.168.1.10] - looking for generic script.
[0622 00:32:36] 0x101ba9ca0 (Info) Launching fail over script ["/Library/Filesystems/Xsan/bin/cvfail" 192.168.1.10 65129 Archivio]
[0622 00:32:36.090411] 0x101ba9ca0 (Debug) Starting journal log recovery.
[0622 00:33:03] 0x101ba9ca0 (Info) Server Revision 3.5.0 Build 7443 Branch branches_35X (412.3)
[0622 00:33:03] 0x101ba9ca0 (Info) Built for Darwin 10.0 i386
[0622 00:33:03] 0x101ba9ca0 (Info) Created on Mon Dec 7 12:52:39 PST 2009
[0622 00:33:03] 0x101ba9ca0 (Info) Built in /SourceCache/XsanFS/XsanFS-412.3
[0622 00:33:03] 0x101ba9ca0 (Info)
Configuration:
DiskTypes-2
Disks-2
StripeGroups-2
MaxConnections-139
ThreadPoolSize-256
StripeAlignSize-32
FsBlockSize-16384
BufferCacheSize-128M
InodeCacheSize-32768
RestoreJournal-Disabled
RestoreJournalDir-None
[0622 00:33:03] 0x101ba9ca0 (Info) Self (192.168.1.10) IP address is 192.168.1.10.
[0622 00:33:03.588379] 0x101ba9ca0 (Debug) No fsports file - port range enforcement disabled.
[0622 00:33:03] 0x101ba9ca0 (Info) Listening on TCP socket 192.168.1.10:65169
[0622 00:33:03] 0x101ba9ca0 (Info) Node [0] [192.168.1.10:65169] File System Manager Login.
[0622 00:33:03] 0x101ba9ca0 (Info) ForceStripeAlignment is enabled.
[0622 00:33:03] 0x101ba9ca0 (Info) Service standing by on host '192.168.1.10:65169'.
[0622 00:33:16.211511] 0x101ba9ca0 (Debug) Standby service - NSS ping from 10.0.0.2:65181.
[0622 00:33:16.211547] 0x101ba9ca0 (Debug) Vote count is 1
[0622 00:33:16.211855] 0x101ba9ca0 (Debug) FOUsurpCheck: read ARB info (pass 1): host (192.168.1.10:65129) conns 999999 age 1308695554.00 secs his delta 0.00 secs my delta 12.00 secs.
[0622 00:33:16.211862] 0x101ba9ca0 (Debug) FOUsurpCheck: ARB is already mine.
[0622 00:33:16] 0x101ba9ca0 (Info) Branding Arbitration Block (attempt 1) votes 1.
[0622 00:47:25] 0x101ba9ca0 (Info) Server Revision 3.5.0 Build 7443 Branch branches_35X (412.3)
[0622 00:47:25] 0x101ba9ca0 (Info) Built for Darwin 10.0 i386
[0622 00:47:25] 0x101ba9ca0 (Info) Created on Mon Dec 7 12:52:39 PST 2009
[0622 00:47:25] 0x101ba9ca0 (Info) Built in /SourceCache/XsanFS/XsanFS-412.3
[0622 00:47:25] 0x101ba9ca0 (Info)
Configuration:
DiskTypes-2
Disks-2
StripeGroups-2
MaxConnections-139
ThreadPoolSize-256
StripeAlignSize-32
FsBlockSize-16384
BufferCacheSize-128M
InodeCacheSize-32768
RestoreJournal-Disabled
RestoreJournalDir-None
[0622 00:47:25] 0x101ba9ca0 (Info) Self (192.168.1.10) IP address is 192.168.1.10.
[0622 00:47:25.300431] 0x101ba9ca0 (Debug) No fsports file - port range enforcement disabled.
[0622 00:47:25] 0x101ba9ca0 (Info) Listening on TCP socket 192.168.1.10:49565
[0622 00:47:25] 0x101ba9ca0 (Info) Node [0] [192.168.1.10:49565] File System Manager Login.
[0622 00:47:25] 0x101ba9ca0 (Info) ForceStripeAlignment is enabled.
[0622 00:47:25] 0x101ba9ca0 (Info) Service standing by on host '192.168.1.10:49565'.
[0622 00:47:26.093927] 0x101ba9ca0 (Debug) Standby service - NSS ping from 10.0.0.2:49570.
[0622 00:47:26.093961] 0x101ba9ca0 (Debug) Vote count is 1
[0622 00:47:26.094219] 0x101ba9ca0 (Debug) FOUsurpCheck: read ARB info (pass 1): host (192.168.1.10:65169) conns 999999 age 1308695596.00 secs his delta 0.00 secs my delta 0.00 secs.
[0622 00:47:26.094227] 0x101ba9ca0 (Debug) FOUsurpCheck: polling ARB block to check for active peer (pass 1).
[0622 00:47:27.094524] 0x101ba9ca0 (Debug) FOUsurpCheck: read ARB info (pass 2): host (192.168.1.10:65169) conns 999999 age 1308695596.00 secs his delta 0.00 secs my delta 1.00 secs.
[0622 00:47:27.094548] 0x101ba9ca0 (Debug) FOUsurpCheck: ARB is already mine.
[0622 00:47:27] 0x101ba9ca0 (Info) Branding Arbitration Block (attempt 1) votes 1.
a few more details:
192.168.1.10 is LAN ip for xserve
10.0.0.2 is SAN ip for xserve (san promise has 10.0.0.1)
1 single cluster with 4 luns, 1 lun "archivio" 8Th and its lun with metadatajournal (300Gb), 1 lun "lavori" 15Th and its lun with metadatajournal (300Gb).
no clients by now
XSAN 2.2.1, Mac OS X (10.6.7), Xserve 2xXeon, 8Gb, SAN Promise VTR
Posted on Jun 21, 2011 4:43 PM