Newsroom Update

Beginning in May, a special Today at Apple series titled “Made for Business” will offer small business owners and entrepreneurs free opportunities to learn how Apple products and services can support their growth and success. Learn more >

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

New Intel Xserve rebooting under network load

We added an Xserve before the end of the year (2006), and we added an Xserve RAID to it just this last week.

During the period before the RAID was installed, the Xserve was used lightly as a file server, mostly just enough for me to move some files around the office and start to get to know OS X Server.

With the RAID attached we've started to try using the system as we intend, which is to have a large common file repository where common files can be shared out to the several clients on the GigE network.

So far the results have been disappointing, the Xserve has shut down and reboot itself a few times during periods of active use. I searched this forum and found a suggestion to run memtest and I have that tool running right now on the server. The system has 2GB of RAM installed.

Is RAM a good place to start with crashes that seem to be related to network load? How else can I troubleshoot?

I've not yet called Apple, I hoped to get more info before standing on line in the phone queue.

MacBook Pro, Dual Core 2.16, Mac OS X (10.4.8), Keyboard protected with leather

Posted on Feb 12, 2007 1:50 PM

Reply
17 replies

Feb 13, 2007 2:57 AM in response to Wes Plate

Wes Plate-

Something wrong there. My gut says hardware. When it shuts down is it graceful or pow and off?

Did you enable any other services when you attached the RAID? Are you doing anything radically different.

Odd that nothing is in the panic log. Do you have redundant power supplies? One of those heading south and no backup PS could cause an immediate shutdown.

Memtest is good for rooting out errors, but you should run it at east 10 times or so just to be sure.

Luck-

-DaddyPaycheck

Feb 13, 2007 5:50 AM in response to DaddyPaycheck

Wes Plate-

Something wrong there. My gut says hardware. When it
shuts down is it graceful or pow and off?


I'm not sure the exact answer. We were testing the migration to the Xserve and RAID when suddenly we noticed network connection to the server was gone. I went to the server room, the display on the Xserve was off and the two raided system drives were busy for a couple minutes, then the system rebooted.


Did you enable any other services when you attached
the RAID? Are you doing anything radically
different.


I think the RAID is the only change. I've been reading up on how to do Open Director, DHCP, DNS and that stuff, but I don't know enough yet to actually do them.


Odd that nothing is in the panic log. Do you have
redundant power supplies? One of those heading south
and no backup PS could cause an immediate shutdown.


The Xserve does not have redundant power. Is that an upgrade I can do myself?


Memtest is good for rooting out errors, but you
should run it at east 10 times or so just to be
sure.


I just started a 10 times test.



Later today I'm going to test getting the RAID out of the configuration. We are trying to replace two firewire drives connected to one of our systems with shares on the RAID so everyone can access the files, so I plan to connect those firewire drives to the Xserve and then share them from there. Seems like a good test.

Feb 13, 2007 6:09 AM in response to Wes Plate

Wes Plate-

Any chance of idiot-interference (somebody hit the button?) on the power down? Is the server room on UPS? I would cover that before anything else.

Any chance your AC supply side got overloaded? A RAID and an X is a bit of a load and your circuit should be rated for that load.

I am fairly certain that you can purchase the power supplies separately. Plug and pray I would guess but I haven't done this myself.

Go slow when connecting peripherals. Make sure everything is working correctly first before proceeding to the next step. Consider troubleshooting this problem first, find the cause, and then make sure things are stable before going to the next step.

Luck-

-DaddyPaycheck

Feb 13, 2007 6:13 AM in response to DaddyPaycheck

Any chance of idiot-interference (somebody hit the
button?) on the power down? Is the server room on
UPS? I would cover that before anything else.


UPS, yes. Chance of button-pressing? No.


Any chance your AC supply side got overloaded? A RAID
and an X is a bit of a load and your circuit should
be rated for that load.


We've got massive power (relatively speaking) running into our "server room". We just added a 30 amp circuit to connect the rack-mounted UPS we got for the RAID. Thinking back, I hadn't moved the RAID to the new UPS, it was connected to the two existing UPSs, though they aren't showing overload on their indicators, but they're at about 60%. When we next test the RAID it will certainly be on its own UPS.


I am fairly certain that you can purchase the power
supplies separately. Plug and pray I would guess but
I haven't done this myself.


I'll look into it.


Go slow when connecting peripherals. Make sure
everything is working correctly first before
proceeding to the next step. Consider troubleshooting
this problem first, find the cause, and then make
sure things are stable before going to the next
step.


I've been trying to do just this, part of the reason the Xserve RAID came a couple months after the Xserve. I should have tested the Firewire drives on the Xserve before, that was an oversight.


I'll post more later today.

Feb 13, 2007 2:51 PM in response to Wes Plate

More data points, though I may have to get on the horn with Apple support. I am interested in any more thoughts people have, of course.

This morning, as suggested, I ran memtest in a loop of 10. No problems reported.

Later in the day we moved the two firewire drives we're trying to replace with the Xserve/Xserve RAID onto the Xserve, disconnected the RAID and shared the two Firewire drives on the network. The user of these drives mounted them and ran the test programs that use the files on the Firewire drives. Within a couple of minutes the Xserve was dead again. The network connection to the system was gone, and a few minutes after that the Xserve rebooted itself.

Still no panic.log file.

I found it fairly interesting that the crashing didn't seem to be RAID related. I mounted the RAID again and turned on sharing for the folders I need to access. However I cannot now get to the Xserve over the network!

I cannot get from other machines TO to the Xserve, either via its .local bonjour name or via its IP address. From the Xserve I cannot get out to computers on the LAN, I also cannot access web sites outside our network. I also cannot see the RAID via RAID Admin.

This causes me to question the ethernet controller. It seems dead now, maybe until now it was almost dead?

Feb 13, 2007 3:38 PM in response to DaddyPaycheck

I'm still going to call Apple, but I discovered something else just now-

This morning I had switched the Xserve to a "Link Aggregate" ethernet configuration, which I read about in my handy Administering OS X Server book. Was fine until this last reboot. So in my attempts to bring life back to the system I turned off the Link Aggregate port and turned Ethernet 1 back on, but it didn't work (as I posted). I had to Delete the Link Aggregate configuration (not just disable it) and then, network life returned to the system.

But still, I don't think that our demands should be rebooting an Xserve.

Feb 14, 2007 11:27 AM in response to Wes Plate

I still haven't called Apple, I keep looking for clues.

Latest: I have been able to reproduce on my own the server crash, so I've been letting it happen as often as I can so I can find more clues and I think I have a good one...

User uploaded file

As this particular test script I have runs, the amount of wired memory grows, until it practically takes up everything. In fact the server crashed between 5 and 10 seconds after I took the above screenshot.

So more RAM is needed, I'd say? Am I in danger of filling up 4GB just as easily as we're filling up 2GB?

Feb 14, 2007 11:53 AM in response to Wes Plate

Wow, pretty conclusive evidence of the problem. If a single app is using this much memory it should be obvious in the Activity Monitor above. Sort by memory usage see what bubbles to the top.

It will probably be Apple File Server or something opaque like kernel task but you never know. Grabbing samples may or may not help analysis. I think selecting the thread and "inspecting" is the best you can do with the regular tools. Apple's Dev tools have several interesting debugging tools, Big Top and Shark for example, that you can attach to virtually any process that is running and see where it is spending it's time and resources. I've never tried attaching them to system processes but they are technically no different than regular processes.

Out of interest how do you cause the problem? Is it just overloading it or is it something specific? I was thinking that you could hit the server and see if it recovers over time... any process that eventually swallows all memory always suggests memory leakage to me. If you could watch a thread always increment in memory every time you do X that is pretty clear evidence of leak situation. On the PPC AFP now seems to free memory way better then it ever did before the Intel versions came out. They clearly dug deeply into the guts of AFP for the Intel port and the PPC side gained some beenfits but perhaps there is still a missed "free" on the Intel side...

=Tod

Feb 14, 2007 12:02 PM in response to Tod Kuykendall

I had the same thought to see who is using up all my memory...

click the image to see it larger
User uploaded file

...Doesn't tell me much though. Certainly none of our programs are running on the Xserve, we have some utilities that we run on our computers that search for and write files and these programs don't have memory issues locally. But when we mount the shares on the Xserve and our programs search the shares for files, the Xserve quickly runs out of memory.

Feb 14, 2007 12:48 PM in response to Wes Plate

Wes,

Change the pop-up after the search field to either Active Processes or All Processes to get at the system level stuff and sort by memory. Clearly it's nothing you're running in your space that's triggering it.

This is no guarantee to reveal which process it is because it could be an underlying process but memory is much easier to trace than CPU consumption so I would be surprised if you can't find it.

=Tod

Feb 14, 2007 1:57 PM in response to Tod Kuykendall

Thanks, didn't know about that pop-up.

click the image to see it larger
User uploaded file

Armed with the confidence that I finally figured out what was causing the reboots-- running out of RAM-- and what seemed to be causing the RAM consumption-- network file searches from connected clients-- I called Apple tech support and provided them a load of data. Fingers crossed for a fix, our server will remain dormant until it is fixed.

Feb 15, 2007 9:14 AM in response to Wes Plate

I don't know if it is any consolation but it's not your problem alone. Here is a thread it AFP dating from late December/Early January.

http://www.afp548.com/forum/viewtopic.php?forum=25&showtopic=15994

I'm not sure this actually "helps" at all except that they are aware of it. Maybe you'll get lucky and the rumored 10.4.9 update will contain a fix. Well, you can always hope... 😉

Keep us posted,

=Tod

G5/2.0x2, Dual XServes x2, XRAID, beige G3 501Mhz

New Intel Xserve rebooting under network load

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.