I am going to cross-post this to the MacEnterprise.org mailing list.
I am seeing the same problems with OS X Server 10.3.9 and 10.4.2. The server starts as normal, the Netboot service shows up fine, and the first OS X client can start NetBooting.
Subsequent clients are unable to finish boot--they can find the server and request the image but stop there at the graphical stage that Mike Bombich would describe as "spinning globe turns into indeterminate progress indicator" (
http://www.bombich.com/mactips/netboot.html ). Server Admin shows the NetBoot process as "busy" (light green with elipses) and all the Server Admin detail screens are blank. NetBoot has functionally crashed. NFS services show up in the Server Admin screens as all running fine. The faulty behavior did not appear prior to OS X Server 10.3.9 and I had previously been able to perform 50-client NetBoots in one of the same network environments (I am seeing the problem at three sites).
I did a bit more troubleshooting and started the NetBoot client in verbose mode. It became clear immediately that there were RPC communication problems using both TCP and UDP. Running "rpcinfo -p" on the server showed everything looking fine. A "showmount -e" to the server's IP, however, showed that no mounts at all were being offered by NFS (i.e. no /Library/NetBoot/NetBootSP0 shows up with showmount). One of the log files available through the XServe's Console.app showed that mountd was aware of the requests from the clients but was not able to fulfill them. Killing mountd and bringing it back up in debug mode (mountd -d) does not indicate any strange behavior, and does restore the mounts to "showmount" but does not actually fix NetBoot, which by this point seems hopelessly lost. The only way to restore NetBoot functionality from here is to restart the server.
(The curious will notice that just about all these troubleshooting steps were described either on Bombich's site or on pages 454-455 of Bartosh and Faas's "Essential Mac OS X Panther Server Administration)
The problem is not entirely consistent. Once the system breaks it seems to exhibit the pattern above for a while, even after restarting the server. But within a day it may be able to start working fine again and deploy images to clients. I tried it just now to break it again to try to get more information for this post, but it is actually working again at the current site (although not at two others).
Has anyone else seen this and done any further troubleshooting on it?