xgrid for novices...
Hi all,
I would like to set up a grid to run energy minimisation calculations. The application that I'll run (FORTRAN 77 coded, command-line input) usually takes 56 days to run on my 1.8 GHz G5, so I guess this is exactly the sort of thing that xgrid was intended for...
I've setup a controller and a couple of 'test' agents but am struggling with job submission - the input string:
"xgrid -h <mycomputer name> -p <mypassword> -grid list",
returns the following error message:
"<date time> xgrid[11023] error: could not connect to <mycomputer name> (reason unknown)".
Similarly, "xgrid -job submit <path to job>" simply spits out the xgrid man page.
For the purposes of these tests, I was not running a firewall, so that can't be the problem. I have also managed to submit jobs from an application built from the developer tools (Xgrid Feeder), so it's clearly just a simple command-line error that I can't figure - any ideas?
On a similar note; how does the xgrid controller break a job into smaller tasks? Does it take cues from commands within the source (which I guess would require me to add in some form of MPI calls around loops in my code), or can it 'intelligently' find its own breakpoints? The jobs that I have submitted from the Feeder application always appear as a single task, and are, therefore, shipped to a single agent. I'm hoping that I won't have to parallelize the code manually...
Many thanks for any of your thoughts or ideas, RN
I would like to set up a grid to run energy minimisation calculations. The application that I'll run (FORTRAN 77 coded, command-line input) usually takes 56 days to run on my 1.8 GHz G5, so I guess this is exactly the sort of thing that xgrid was intended for...
I've setup a controller and a couple of 'test' agents but am struggling with job submission - the input string:
"xgrid -h <mycomputer name> -p <mypassword> -grid list",
returns the following error message:
"<date time> xgrid[11023] error: could not connect to <mycomputer name> (reason unknown)".
Similarly, "xgrid -job submit <path to job>" simply spits out the xgrid man page.
For the purposes of these tests, I was not running a firewall, so that can't be the problem. I have also managed to submit jobs from an application built from the developer tools (Xgrid Feeder), so it's clearly just a simple command-line error that I can't figure - any ideas?
On a similar note; how does the xgrid controller break a job into smaller tasks? Does it take cues from commands within the source (which I guess would require me to add in some form of MPI calls around loops in my code), or can it 'intelligently' find its own breakpoints? The jobs that I have submitted from the Feeder application always appear as a single task, and are, therefore, shipped to a single agent. I'm hoping that I won't have to parallelize the code manually...
Many thanks for any of your thoughts or ideas, RN