Thursday, November 5, 2009

Xgrid: warnings and stalled queues

I'm still feeling my way into Xgrid. The lights are out and my hand is on the wall. The most important thing I've learned is that the Xgrid Amin utility (get it from here) is very helpful. It's useful because it is easy to screw up the Controller so that a job submission or other request fails, silently on the command line, but you can inspect the queue using Xgrid Admin and see whether the job is actually there, and inspect logs, even re-run jobs.



The installer puts it in /Applications/Server. I still don't understand why launching xgridcontrollerd gives such variable results, or how I'm screwing things up when I do. I get different combinations of:


<Warning>: Warning: controller error reading service principal file "/etc/xgrid/controller/service-principal"
<Warning>: Warning: controller default service principal changed: xgrid/localhost.local@(null)
<Warning>: Warning: controller database file was not closed cleanly
<Notice>: Notice: controller database "/var/xgrid/controller/datastore.db" opened
<Notice>: Notice: controller started
<Notice>: Notice: controller database loaded
<Warning>: Warning: controller could not determine the default grid
<Notice>: Notice: controller created grid "Xgrid" (id = 0)
<Info>: Info: controller connection closed (sid = 0x1006078a0)
<Info>: Info: controller connection closed (sid = 0x100606640)
<Error>: Error: controller session acceptor failed: BEEPError 600 (could not open local port)


Sometimes we hang, sometimes we don't!
Scrubbing the database helps, but only for a while:


sudo xgridctl c stop
sudo xgridctl c off
sudo rm /var/xgrid/controller/datastore.db
sudo rm /var/xgrid/controller/status
sudo xgridctl c start
sudo xgridctl c on


After a long series of job submissions and run requests that failed, I open XGrid Admin and see this.



I put the lid of the laptop down to sleep, decide better, and open it up. They've all run! WTF?



One more thing. Even if I set the Xgrid Agent authentication to None in System Prefs, you still need to have these files:

/etc/xgrid/controller/client-password
/etc/xgrid/controller/agent-password


Anybody who knows what the correct way to set these up is, please holler.