Posts

Showing posts from July, 2005

lab 37 - Geryon's mapreduce

NAME lab 37 - Geryon's mapreduce NOTES I have a registry of kfs disks and cpus in my cluster, and the registry is mounted on every node. Upon this infrastructure I've written some shell scripts to simulate a MapReduce function. I want to quantize and bound the data and the computation. To that end, a process works on only one 64MB disk for input, and may write to another disk for output. The spliting of the data is also crucial to the parallelization. Each input disk can be handled concurrently on any cpu node in the cluster. The namespace for a process is built from finding resources in the registry. Because there are several cpus available to run a command block, I choose the cpu in round robin using a shell function to populate a list of the available cpu nodes, and take the head of the list with each subsequent call until the list is empty. [1] cpulist=() subfn nextcpu { if {~ $#cpulist 0} { cpulist=`{ndb/regquery -n svc rstyx} } result = ${hd $cpulist}

lab 36 - Geryon's registry

NAME lab 36 - Geryon's registry NOTES A critical piece of Geryon is the registry of disks and cpus. One of the immediate problems to deal with when setting up a grid cluster is the fact that the nodes come and go including the one holding the registry. To deal with one aspect of this, the registry restarting, I modified grid/register and grid/reglisten commands from Inferno to monitor the availability of the registry, remount it if neccessary and re-announce the service. I use grid/register for the disk services. Here is an example of how I create a new chunk and register a new kfs disk. % zeros 1024 65536 > /chunks/chunk0 % (grid/register -a id chunk0 {disk/kfs -rPW /chunks/chunk0}) I do this a number of times on each node that has spare disk capacity. A simple script registers all disks when I restart a node. All these disks will appear as services in the registry identified as tcp!hostname!port with some attributes including the chunk identifier, which should

lab 35 - Geryon, another Inferno grid

NAME lab 35 - Geryon , another Inferno grid NOTES Grid computing is everywhere it seems, and one of the obvious applications of Inferno is grid computing. As usual, I know little about it aside from reading a few papers to get me enthused enough to make some effort in that area. If I read all the current research I'd have little time to do the programming. So instead I'll jump in and see if I can swim. I'm trying to setup a simple grid, with tools for MapReduce functionality. I'm calling this system Geryon (because things need names). This first stab is to see what can be done with the pieces already available in the system. And from this working prototype find out what pieces need to be written to fill in the gaps or improve it in any way. Geryon is too large to cover in one blog entry so I'll be posting it piecemeal. I'll setup a wiki page to cover the whole setup and running of Geryon as I get further along. OVERVIEW I'll start with an overview

usual disclaimer

I do not know what I am doing. The point about writing the software is to learn about a problem. This means the software is not likely to be of practical use, though it maybe, I don't know. Why don't I go and read the research papers on the subject? Reading one or two is fine, and I do, and they help. But reading too much means all I'm doing is reading and not coding. Hands on experience is learning, maybe at a slower rate than reading, but at a much deeper level. The fun is in the coding. Coding generates ideas. After coding, reading the research papers becomes much more valuable, because I now have real experience to compare against. I'm doing it for fun. This should be called a fun lab or something. It is a lab for doing experiments and writing about them, or in the words of Homer Simpson, "It's just a bunch of stuff that happens."

lab 34 - lexis: semantic binary model implementation

NAME lab 34 - lexis: semantic binary model implementation NOTES The code linked to below is a database implementation based on the paper, A File Structure for Semantic Databases by N. Rishe. The paper is brief and required reading for the rest of this lab. The application is in three modules: the Btree, the SBM abstraction called lexis , and the query module. The Btree implementation stores key-value pairs. This Btree is a large improvement over my previous attempt, tickfs , in that a prefix length is included for each entry which improves storage efficiency. Binary search is used when searching within a block making it faster , especially considering the increase in the number of entries per block. The caching and locking mechanism have also changed, and are described in earlier labs. Lexis is the SBM abstraction layer that sits on top of the Btree. It does not use the value field of the Btree as all information is stored in the key. The abstractions exported are Facts , wh