xgrid for novices...

Question

xgrid for novices...

Hi all,

I would like to set up a grid to run energy minimisation calculations. The application that I'll run (FORTRAN 77 coded, command-line input) usually takes 56 days to run on my 1.8 GHz G5, so I guess this is exactly the sort of thing that xgrid was intended for...

I've setup a controller and a couple of 'test' agents but am struggling with job submission - the input string:
"xgrid -h <mycomputer name> -p <mypassword> -grid list",

returns the following error message:
"<date time> xgrid[11023] error: could not connect to <mycomputer name> (reason unknown)".

Similarly, "xgrid -job submit <path to job>" simply spits out the xgrid man page.

For the purposes of these tests, I was not running a firewall, so that can't be the problem. I have also managed to submit jobs from an application built from the developer tools (Xgrid Feeder), so it's clearly just a simple command-line error that I can't figure - any ideas?

On a similar note; how does the xgrid controller break a job into smaller tasks? Does it take cues from commands within the source (which I guess would require me to add in some form of MPI calls around loops in my code), or can it 'intelligently' find its own breakpoints? The jobs that I have submitted from the Feeder application always appear as a single task, and are, therefore, shipped to a single agent. I'm hoping that I won't have to parallelize the code manually...

Many thanks for any of your thoughts or ideas, RN

Posted on Aug 25, 2005 7:15 AM

Reply

Answer 1

Aug 30, 2005 3:06 PM in response to Richard Neils

I have been playing around with Xgrid for a month or so, these are my two cents...

I think your problem is that you are not mentioning WHICH grid you are submitting the job. See below.

First, make sure that you have Tiger SERVER running on one of the machines, and that you have the Xgrid service turned on. It's not clear if your setup is like that, although you say that you can run XGrid Feeder. You can monitor the state of the grid by running Xgrid Admin, in the server or through the Server Admin Tools (downloadable from apple site).

I set up the Xgrid server to work with passwords, both to submit jobs and for agent communication with the server. I was confused at which was which at the beginning, so both passwords ended up being the same.

Then, to make things easier, I set up two variables in your .bashrc (or /etc/bashrc if you want it for all users),

XGRID CONTROLLERHOSTNAME=yourserver.com
XGRID CONTROLLERPASSWORD=yourpassword

So that every time you submit to this server you don't have to enter it's name and the password.

Then, I've had success running things with

xgrid -job submit exec arg1 arg2 -so /dir/output.log -out /datadir -in /indir

Note that you have to be standing in the directory where the executable is located. -so (standard output) is the option that redirects the output of the program to the file /dir/output.log. /datadir is the directory where the output files of the program will be written to. -in is the directory that gets copied to the agents, in case you have to read files or something.
exec is an executable file, *already compiled*. Don't send source code. argn are the arguments that get passed to the program.

***
Regarding how Xgrid works, no, it doesn't break up your jobs for you. It takes a command and sends it to one of the agents, and returns the results to you when the agent is done. If you want job distribution, you have to do it manually: submit one job for every parameter that you want to run the program (and results from one job don't have to depend on the others). For instance if you want to calculate energy minimization for different temperatures, or configurations, you send each temperature as a different job.

Alternatively, you can have this done automatically with an XML script that you submit to the xgrid server. I haven't finished working this out, and the documentation is through examples so it doesn't help that much.

If you are stuck with "unparallelizable" code, that is, you only need to run one job, then Xgrid is not for you 😟

Reply

Answer 2

Oct 31, 2005 10:14 AM in response to Richard Neils

Hi Richard,

I'm a novice by myself, but I've got two tips for you that might help:
- if you haven't already installed "Tiger" Server: Get yourself the XGrid Technology Preview (2) and install this on a "Panther"-Mac. "Tiger"-Agents should be able to connect to a "Panther"-controller XGrid, too. This is cheaper than upgrading to "Tiger" Server.
- This preview work's fine with MacMPI (yes, you will have to add MPI calls in your source), as does "Tiger" Server.

Check out Google for the Tech Preview (2) and MacMPI.

@could not connect to <mycomputername>: Have you yet tried to connect to the conttroller via submitting your IP-adress instead of your computername? That has helped in my case.

Frank

Reply