Wednesday, July 13, 2005

lab 36 - Geryon's registry

NAME

lab 36 - Geryon's registry

NOTES

A critical piece of Geryon is the registry of disks and cpus. One of the immediate problems to deal with when setting up a grid cluster is the fact that the nodes come and go including the one holding the registry.

To deal with one aspect of this, the registry restarting, I modified grid/register and grid/reglisten commands from Inferno to monitor the availability of the registry, remount it if neccessary and re-announce the service.

I use grid/register for the disk services. Here is an example of how I create a new chunk and register a new kfs disk.

% zeros 1024 65536 > /chunks/chunk0
% (grid/register -a id chunk0 
{disk/kfs -rPW /chunks/chunk0})

I do this a number of times on each node that has spare disk capacity. A simple script registers all disks when I restart a node. All these disks will appear as services in the registry identified as tcp!hostname!port with some attributes including the chunk identifier, which should be unique for the host.

The next step is to name each disk. For this I use ndb. I add a file, /lib/ndb/cluster, to the ndb database with entries of the form

name=d0.mbox kfs 
  master=host0!chunk0 
  replica=d0.replica

name=d1.mbox kfs 
  master=host0!chunk1 
  replica=d1.replica

name=d0.replica kfs 
  master=host1!chunk0 replica

name=d1.replica kfs 
  master=host1!chunk1 replica

The first field is the disk name, which is unique for the cluster. The master is the chunk running on the host that serves kfs for this disk. The replica field identifies a backup disk. I hope in the future to make if possible to dynamically switch between master and replicas and use replicas during computation. But I'll skip it for now. I replicate disks by using Inferno's applylog and updatelog tools.

Once this is all in ndb I can run a script that will update the registry with the disk names.

fn refresh {
 names=`{ndb/query -a  kfs '' name}

 for (i in $names) {
  (host chunk) = ${split ! `{ndb/query name $i master}}
  addr = `{ndb/regquery -n name $host id $chunk}
  if {ftest -e /mnt/registry/ ^$i} {
   (echo host $host automount 1 persist 1 
     addr $addr replica 
     `{ndb/query name $i replica}> /mnt/registry/^$i)
  } {
   (echo $i host $host automount 1 persist 1
     addr $addr replica
     `{ndb/query name $i replica}> /mnt/registry/new)
  }
 }
}

This needs to be run when the ndb file or list of registered services changes. So ideally this should be automatic. I can quite easily see that happen, either by building a ndb file inside the registry and have it respond to changes, or implement an events file in the registry, and attach a process to that. This is a problem to work on later.

Once registered, these disks can be used from any node within the cluster. For example, I use a shell function to take a disk name and mount it as /n/diskname,

% fn rmnt {
 for (file in $*) {
  (disk rest ) := `{cat /mnt/registry/$file}
  while {! ~ $#rest 0} {
   (name val tail) := $rest
   if { ~ $name 'addr'} {mount -c $val /n/ ^ $disk}
   rest = $tail
  }
 }
}
% rmnt d0.mbox d1.mbox
% ls /n/d?.mbox
...

To register a cpu service,

% grid/reglisten -r svc rstyx 'tcp!*!0' {runas $user auxi/rstyxd&}

This will announce on a new address and use that address as the service name in the registry. We can then get a list of all addresses of cpu service

% ndb/regquery -n svc rstyx

For both grid/register and grid/reglisten, the service names are automatically removed from the registry once the process exits. All connections are authenticated. For the kfs disks, they should all use the same /adm/users file, something that should be copied onto the disk when it is initialized, so that permissions are enforced consistently across the cluster.

So far we have all the services we need announced dynamically. We have a naming scheme and the infrastructure for running code anywhere in the cluster. What remains is the shell code to tie it together to build a simple mapreduce.

I'll put instructions for setting up Geryon on my inferno wiki.

FILES

caerwyn.com/lab/36/

No comments: