Tuesday, December 13, 2005

lab 49 - wrappers


lab 49 - wrappers


Acme is the hub of my activity where all my tools are at hand. It is inconvenient to have to step outside it while working on a task. But it is impractical to port all the tools I need into Inferno. However, Inferno is an integrating environment. It is oftentimes easy to put a wrapper around an external tool so that it is usable within Acme.

In this lab I show a few examples of putting wrappers around external tools so that they blend in to the Inferno environment and work together with Inferno's tools.

The first, very simple example is awk.

fn awk {os awk $*}

With this definition it is only possible to use awk as a filter without naming files as command line arguments. But this is often good enough.

Very often I work on my laptop which has NT as the host os. But I need to use dict and spell all the time, and my laptop doesn't have these installed. I integrate these tools into my environment by running emu on a Plan9 or Linux machine and exporting a styx service that will include the #C device bound to /cmd. Then I just need to bind remote service before running the command.

fn spell {
 bind /n/linux/cmd /cmd
 cat $* |os spell
 unmount /n/linux/cmd /cmd > /dev/null >[2=1]

Now I can use >spell within acme and become oblivious to the fact that spell is running elsewhere on my network.

Let's say I want to run javac hello.java from within Acme, and also button-3-clicking on any error messages would jump to the relevant line within the file. The problem here is the Inferno namespace needs to be mapped to the host namespace. This can only be done if the file actually resides on the host. But given that assumption we can easily establish a convention whereby names within inferno map to hostnames.

fn file2host {
 for (i in $*) {
  f := `{cleanname -d `pwd $i}
  echo $f  |sed 's@/n/d@D:@

fn host2file {
  sed 's@D:@/n/d@
  s@C:@/n/c@' |sed 's@'^`{pwd}^'/@@'

fn javac {
 classpath='your favourite jars'
 os javac -classpath $"classpath `{file2host $args} | host2file

This assumes we have bound, say the C: and D: drives to /n/c and /n/d. The mapping is simple. The benefit is that I can use mk to manage builds, and use plumber to plumb the compiler errors to Acme.

Here's a more complicated example, used for building the emu source code and libs. The assumption now is the source code is on the host and the mapping from inferno to host namespace is just prepending the $emuroot.

fn file2root {
 args = $*
 for (i in $args) {
  i = `{cleanname -d `pwd $i}
  echo $emuroot ^ $i |sed 's/\//\\/g'

fn 9c {
 load arg
 args := $*
  I+ {flags = $flags  -I ^ `{file2root $arg}}
  D+ {flags = $flags -$opt ^ $arg}
  '*' {echo unknown option $opt}
  - $args
 cwd=`{echo $emuroot ^ `pwd | sed 's/\//\\/g'}
  -I'C:\Program Files\Microsoft Platform SDK\Include'
  -I'C:\Program Files\Microsoft Visual C++ Toolkit 2003\include'
  -Fo$cwd  )
 os $cc $cflags $flags `{file2root $args}

This is interesting because emu could be built using mk from inside Inferno. Given a definition of 9c, 9l and 9ar for each platform the mkfiles could be simplified (however, shifting the complexity to the shell).

The final example is for XML tools. Often these tools take URLs instead of filenames. The trick is to serve the files I'm working on using httpd. Httpd is now serving as a general mapping from my local namespace to a universal namespace. I can bind the files in from anywhere and then work on them with XML tools, using tools that might also be bound in from anywhere. For example, I want to use xalan to transform XML using XSLT. I bind my working directory onto /services/httpd/root and run svc/httpd/httpd. Then files below the root get mapped to URLs.

fn file2url {
 sysname:=`{cat /dev/sysname}
 url:='http://'$sysname ^$domain
 for (i in $*) {
  f := `{cleanname -d `pwd $i}
  echo $f  |sed 's@/services/httpd/root@'^$url^'@'

fn xslt {
 args := $*
  t+ {flags =-XSL `{file2url $arg}}
  - $args
 if {~ $#* 1} {
  args= -IN `{file2url $args}
 } {
  echo 'usage: xslt -t transform file ' >/fd/2
  raise usage
 bind /n/some_java_host/cmd /cmd
 os java -cp $classpath org.apache.xalan.xslt.Process $flags $args
 unmount /n/some_java_host/cmd /cmd >/dev/null >[2=1]

This last approach is possibly the most flexible. But all of them demonstrate that Inferno supports, in the classic UNIX way, using shell to modify a program's interface. In doing so the external tools and Inferno tools can be made to work together, creating a richer, more productive environment.

Monday, November 28, 2005

lab 48 - symbol


lab 48 - symbol


I'm clearing out the backlog and found this lab's code, which is really a continuation of earlier labs on the lispy-shell (26, 27, 28). I felt that where I was going with lab 28 wasn't working out, or was just going to end up looking like scsh (if I was lucky, and as smart as Olin Shivers).

This attempt is more a mathematica-like-shell, and for a while was looking better, in that it fit in better with the existing sh, but now I've decided against it too. I'm posting it here just to record what I've done. I'll try and explain at the end why I don't like this approach anymore.

In Mathematica everthing is an expression of the form


which translates quite easily into the shell expression

f x y

Because of Inferno sh's command blocks, we can directly support the nesting in expressions:

% Head {f x y}
% Head {f;x;y}
# a sequence of commands {f;x;y} is 
# represented as {List f x y} in "FullForm"

And so, because of command blocks and this simple represenation of expressions, any arbitrarily complex structure can be represented in sh syntax and a sh module that implemented the core set of Mathematica commands could be used to symbolically manipulate that data.

Here are a few more to give you an idea.

% Length {a;b;c;d;e;f;g}
% Rest {a;b;c;d}
% Apply f {a;b;c;d}
{f a b c d}
% Map f {a;b;c;d}
{f a; f b; f c; f d}
% Nest f a 3
{f {f {f a}}}

Another crucial idea of Mathematica is its peculiar "infinite evaluation" system; it keeps evaluating an expression until it no longer changes. I implemented IEval builtin which takes the output of a command and evaluates it again until the output is the same as the input.

% IEval {Point 1 2 3}
{Point 1 2 3}

But I stopped short of implementing the evaluation system described in the Mathematica docs. It was at this point that I realized what I was doing was just a pipeline from sh to sh, and that maybe the commands should read standard input for symbolic data. And then maybe they should be programmable using regular expressions and accept an arbitrary text stream instead of specifically structured text based on sh syntax. And at that point I'm back with the UNIX philosophy of software tools.

So I arrived at the point where even though symbol would fit within sh it might not fit within Inferno. From this exercise I think I have a better appreciation of the existing set of Inferno software tools. It is not always obvious how they are applied to a problem. But sometimes one needs to take a long trip around the computer science field before one can understand their generality, and that a command block is just quoted text; and quoted text is just a text stream.



lab 47 - self tk


lab 47 - self tk


I was inspired by watching a video about Sun's Self system, which also inspired Squeak's morphic user interface. The scripts in this lab were an attempt to get tk windows serving a file that described themselves, so that the file could be copied to disk, and later run to recreate the window.


Here's an example.

% ./text
% cp /chan/tk.239 t1
% run t1

You should now have two text windows open. Edit either one of them, then copy and run them again. Both the contents and the position of each window should be saved and restored. Right-clicking in the window brings up the titlebar, whichs pops up underneath the window because it is the last thing packed into the tk widget. Right-clicking again makes it disappear. One of the ideas being explored is that a set of widgets can be configured and placed on the screen then all the /chan/tk.* files copied to a directory that acts as a stored workspace, to be later restored with all the widgets in the exact same configuration.

I've included attempts at a few others, such as scale, button, and canvas. Most of the functionality is in the tklib sh file.

This is a first cut. To show the 'self' idea translated in a way appropriate (maybe) to the file/shell/software tools experience. A different notation for the files might be more appropriate.



lab 46 - vac script


lab 46 - vac script


This lab is an answer to Jack's question in the comments of lab 41. The putclump/getclump scripts can be made to work with blocks of data. The shell script for this lab, vac, will write a kfs file system to a venti-lite archive in blocks, taking advantage of block level redundancy.

This is demonstration rather that something truly useful. It literally takes hours to store a 64MB kfs file system in venti-lite using this script.

Note also, that it is trivial to implement putclump/getclump using the venti(2) module in inferno and store the blocks in a real venti archive. And this script will work the same way.

I've also included a vcat script to readout the kfs filesystem in one go. I started writing a file2chan interface so I could run disk/kfs directly against the venti-lite archive, but didn't finish it. I've attached that too (clumpchan). Exercise for the reader is to finish it.




I updated vac and clumpchan fixing a couple small bugs in each. Clumpchan does work but only by reading one block at a time, which is not suitable for kfs. I think it would be well worth translating these scripts into limbo, but still keeping them non-server based, venti-lite, then a useful set of tools would come of it.

Wednesday, November 09, 2005

lab 45 - full screen


lab 45 - full screen


I've been wanting to use inferno as a full-screen application on Nt. I don't know too much about the windows API but I took a peek at how Squeak does it. And that was enough to get it to work. This is a first pass at doing it. It should probably be done with parameters to emu. Ideally the inferno host window would be resizable with some way of switching to full screen and window while emu is running.

Just copy the win.c file to your /emu/Nt directory and build. The -g option to emu needs to be the current settings for your monitor.

The other files in this lab are the wmclient, tkclien,t and titlebar to make wm windows look more rio-like. I had done this in an earlier lab, but the wmclient code didn't work properly. The wmclient works properly now. But there's no backdrop menu for window control, which makes it a little difficult to hide and exit the windows.



Monday, October 31, 2005

lab 44 - acme irc client


lab 44 - acme irc client


The code in this lab is a port of an irc client for the Acme editor. The original was by Russ Cox. The translation was pretty straight forward, that is without any creative input from me. While the code is usable as an IRC client there are still plenty of bugs I've left in it for the reader to explore.



Monday, September 26, 2005

lab 43 - acme Wiki


lab 43 - acme Wiki


I ported the Acme Wiki client from Plan 9 to Inferno. At the moment (20050925) a small patch to Inferno's Acme is needed to get this to work. Make the following change (using ed) to /appl/acme/regx.b:

    if ((nc := xgetc(a0, a1, q)) != '#' && nc != '/' && nc != '?')

Download the wiki.tgz and unpack under /acme. Then mount a wiki and start Acme.

% cd /acme
% gunzip < wiki.tgz |gettar
% mount -9 tcp!plan9.bell-labs.com!wiki /mnt/wiki
% acme -c1 /acme/wiki/guide
# inside Acme middle-click Wiki

I took a lot of the code that interfaces with the Acme filesystem from the Mail command and put it into the Acmewin module. This module should be useful to other Acme clients.

Once I started using this it flushed out a few bugs with my inferno wikifs port. I've updated the code under the original lab 30 with fixes.

Thanks to Russ Cox for writing the original wikifs and Acme wiki client. My version is not nearly as well tested as the original, so blame me for all bugs.

Wednesday, August 31, 2005

lab 42 - channels in shell


lab 42 - channels in shell


This is a quicky. I ripped this code off from Rob Pike's squint, an interpreter for his squeak language, a predecessor of limbo. It's an implementation of Eratosthenes's Sieve using communicating processes and channels, which I translated to Inferno shell. This was just to try out the channels in shell, since somehow I've overlooked them till now.

load tk
load expr

fn counter {
 ch := $1
 {i:=2; while {} {send $ch $i; i=${expr $i 1 +}} } 

fn filter {
 prime := $1
 listen := $2
 s := $3
 while {} {
  i := ${recv $listen}
  if {ntest ${expr $i $prime %}} { send $s $i}

fn sieve {
 prime := $1
 chan count
 c := count
 counter count &
 {while {} {
  p := ${recv $c}
  send prime $p
  chan newc^$p
  filter $p  $c  newc^$p &
  c = newc^$p
 }} &

chan prime
sieve prime &
echo ${recv prime}
echo ${recv prime}
echo ${recv prime}

Watch your CPU meter spike when you get a hundred primes or so.

Monday, August 29, 2005

lab 41 - venti lite


lab 41 - venti lite


I've taken another look recently at venti and the ideas from the venti paper.

Venti is a data log and index. The index maps a sha1 hash of a clump of data to the offset of the clump in the log. Clumps are appended to the log after being compressed and wrapped with some meta-data. A sequence of clumps make an arena, and all the arenas, possibly spread over several disks, make up the whole data log. There is enough information in the data log to recreate the index, if neccessary.

The above is my understanding of Venti so far after reading the code and paper. There is a lot more complexity in it's implementation. There are details about the caches, the index, the compression scheme, the blocking and partitioning of disks, and so on. I will ignore these details for now. Although the whole of venti could be ported to Inferno, I want to look at it without getting bogged down in too many details too early.

Reasoning about Venti in the context of Inferno I tried to do some simple analog of Venti using Inferno sh(1). The two basic functions of Venti are the read of a clump using the hash, called a score, to locate it, and writing a clump getting a score in return. I created two sh functions, putclump and getclump.

It is easier to reuse than to reinvent. I use gzip for compression, puttar for packaging a clump, sha1sum to hash the clump, and dbm as the index. Here's the code for putclump.

load expr
if {! ~ $#* 2} {
 echo usage: putclump type file > /fd/2

type := $1
file := $2
(perm device inst 
  owner group 
  size rest) := `{ls -l $file}
if {ntest ${expr $size $Maxclumpsize gt }} {
 echo file too big > /fd/2

(sha f) := `{sha1sum $file}
o := `{dbm/fetch idx $sha >[2] /dev/null}
if{~ $#o 0} {
 (perm device inst owner 
          group offset rest) := `{ls -l arena}
 cat $file > $tmp
 puttar  $tmp |gzip >> arena
  echo $sha $offset |dbm/store idx
rm -f $tmp
echo $sha

To use it, create an arena file, and the index files first.

% touch arena idx.pag idx.dir
% echo this is a test > t1
% putclump Datatype t1

And to prove it works add the same file again and get the same score back.

I can get the contents back out using getclump. Here is how getclump is defined.


score := $1
offset := `{dbm/fetch idx $score}
read -o $offset $Maxclumpsize < arena |
   gunzip | 
   gettarentry {cat}

A file must be less than the Maxclumpsize. If I store a set of files I get a list of scores back. I can write this list to a file and write the list back with a different clump type: Pointertype0. Then I store the final score as a file with one score entry and call this the Roottype.

% { for (i in *.b) {putclump Datatype $i} } > t1
% putclump Pointertype0 > t2
% putclump Roottype t2 > rootscore

This is more or less the hash tree described in the paper. The data log can be scanned, for example to retrieve all the Roottype scores.

% gunzip < arena| gettarentry {echo $file}

This implementation of putclump and getclump could quite easily be moved from shell into limbo also serving the Venti protocol for a rudimentary venti server.

Sunday, August 28, 2005

first year

first year anniversary

A year ago today I started keeping this notebook. The original idea was for it to be a social thing. Instead of keeping a private notebook of my computing experiments, I published everything in a blog to make it part of a conversation.

Who were the other voices?

A pattern emerged in how I went about finding ideas and exploring them. It seems I hit upon an interesting paper--often from a regular source like Google Labs, Bell Labs or the MIT PDOS group--and I'd then try and reason about the paper's subject in the context of Inferno. So the other voices I started listening to were from the authors of these great papers.

Sketching out a program from something I'd read in a paper helped me to understand the paper better and generate ideas that went beyond the paper. This was not just useful, but also a lot of fun. Much as writing is more than communication but an extension of the process of thinking, programming is not merely a corporate activity, or an economic activity, but also a philosophical activity. By which I mean, it's part of the process of reasoning about something.

The code was as much part of the conversation as the prose. And where the prose was english, the chosen platform for expressing the code was Inferno. By including the code an active element was added to the notebook. Everything I talked about could be pulled apart, put back together and run on an Inferno system.

Was Inferno the right choice of system? How does Inferno help or hinder me to reason about computing problems? Alan Kay has said, "Point of view is worth 80 IQ points" when talking about the power of smalltalk. Paul Graham has expressed similar ideas about the power of Lisp. Why not do the Notebook in Squeak or Lisp? I don't have a good answer to this. It might be just accident that I started my career learning UNIX, moved on to Plan 9 and finally landed on Inferno. In a way it was part of my computing culture, and culture is king of all. I strongly believe Inferno is the most powerful environment I can most easily manage in which to explore ideas. It has broad scope, it is small and easy to understand, and the whole system is under my control. The only other environment that comes close is Squeak. But with my UNIX background I have much stronger connection to the UNIX philosophy of programming as opposed to an Object Oriented one.

Looking ahead it's hard to see what I'll be writing about next. The more papers I read and try out, the more it leads me onto other things that I know nothing of. I was interested for a while in symbolic programming, CA and Alife. I would like to try some DHT applications (from papers in MIT PDOS), but I have lost interest at the moment. No doubt my interest will come round again. However, notice the pattern again, the reason for my interest in these things is often caused by a good source of papers on these topics. In other words, who's talking about what. It's likely I'll find another source of papers on a subject I haven't considered before and will put a lot of effort in writing code to help me think through those new ideas.

I'd like to see the Inferno community grow with more programmers with whom I can talk more directly. I'd like to see wild new ideas about computing with Inferno as part of the language.

Sunday, August 21, 2005

lab 40 - distributing data


lab 40 - distributing data


The first problem I had to deal with once extracting the data from the reality mining data set was how to distribute it across all the disks. So this lab describes how I did it.

The Geyron grid I have at home is a simulation grid. I'm just pretending that I have 16 real disks, when I actually only have two. [1] The sizes of things are also scaled down, so each disk is only 64MB instead of a more typical 40GB.

I divided each disk into 1MB chunks for this simulation. On each disk I created 60 files numbered 0.ckfree ... 59.ckfree. Each empty file is a record that an available chunk is on this disk. I don't use 64 because I leave room for slop. Each chunk is going to be slightly over 1MB. I do all this easily enough using my friend, the registry:

% for (i in `{ndb/regquery -n resource kfs}) {
    mount $i /n/remote
    for (j in `{seq 0 59}) {
        > /n/remote/$j.ckfree

The idea behind this, as suggested in the google fs paper, is that each disk is the master record of used and free chunks that it contains. I scan all disks to update a cache for fast querying. (Seq in the code above is a sequence generator as in plan 9 and included in the files for this lab.)

I created a file2chan server, chunkqueue, that can allocate a free chunk from all the available chunks across all disks. Writes to the file add free chunks to memory. Reads from the file return the next chunk. The chunk is chosen by randomly selecting a disk, then taking the head of a chunk list. The aim is to uniformly distribute the repository across all disks. But this will only work on average. With the few disks I have I'll still get many chunks on the same disk.

A important goal is to get very high parallelism in my distributed file processing. Two truly concurrent processes are ones that are running on independent CPUs and reading from separate disks with no communication between the two. If I have two disks I want to split the file between the disks and process both chunks at the same time on different CPUs. Anytime I get several chunks for the same repository on the same disk, and there is no other replica of those chunks on other disks, I am loosing some amount of parallelism.

I announce chunkqueue to the registry. I can then scan all disks, just like I did above, and write all the free chunks to the chunkqueue.

% chunkqueue /tmp/chunkq/cq
% grid/reglisten -r svc chunkq 'tcp!*!0' {export /tmp/chunkq&}
% freechunkscan

Freechunkscan is in the new gridlib file.

A chunk handle is a cluster unique filename for the chunk. It is just a sequence number with .ck appended. When a free chunk is allocated the .ckfree file is removed and replaced with the nnn.ck file.

subfn allochunk {
 chunkid := `{next /n/master/next} ^ .ck
 (free disk) := `{read < /n/chunkq/cq}
 mount -c $disk /n/remote
 mv /n/remote/^$free /n/remote/^$chunkid
 echo $chunkid $disk >> /n/master/locator

I assign one disk as the master file system that will keep a permanent record of chunks used by a repository file. The master is assigned in /lib/ndb/cluster and after a refresh shows up in the registry. A file on the master represents on repository file (BigFile) and contains a list of chunk handles. Also stored on the master is the mapping of chunk handle to disks, but this can be regenerated by periodically scanning all the disks. The mapping of chunks to disks is kept distinct from the repository-to-chunks mapping to allow for the possibility of chunks stored on multiple disks and chunk movement among disks. To locate a chunk I just grep the /n/master/locator file (see locatechunk in gridlib).

Now to tie it together with a command to write a file to the repository file.

fn putfiles {
 load expr
 pctl forkns
 rmnt master
 mount `{ndb/regquery -n svc chunkq} /n/chunkq
 files:=${tl $*}

 chunk := `{tail -1 $bigfile}
 if {~ $#chunk 0} {
  chunk = ${allochunk}
  echo $chunk >> $bigfile
 mount -c ${locatechunk $chunk} /n/remote
 for(i in $files){
  (perm device  inst owner group size rest) := `{ls -l /n/remote/$chunk}
  if { ntest ${expr $size $CHUNKSIZE gt} } {
   chunk = ${allochunk}
   echo $chunk >> $bigfile
   mount -c ${locatechunk $chunk} /n/remote
  putwar $i |gzip >> /n/remote/$chunk

I append files until I go over the CHUNKSIZE threshold. There are no files that span chunks. Each chunk is self contained, which I hope will make applying mapreduce easier.

When I extracted the data from the reality mining mysql dump I got over 200 1MB files. A database table, like callspan, would be broken up into about 20 files. To distribute the callspan table as one repository file across Geryon I ran the following,

% putfiles /n/master/callspan callspan*
% cat /n/master/callspan
% cat /n/master/locator
0001.ck tcp!oak!1850
0002.ck tcp!fir!10181

It should be possible to see how a rudimentary inferno cluster file system could be built using similar techniques, hiding much of the above functionality behind a clean file system interface. I'll likely try this at some point. The problem that always gets me is that once I allow files to span chunks, how do I cleanly handle breaking data up for mapreduce?

I still haven't addressed fault tolerance and replication of chunks yet. If anyone cares to have a stab at it ...




I hope to turn Geryon from a simulation into a modest working grid. And of course, I'll describe the progress of setting up the geryon here.

Monday, August 08, 2005

lab 39 - tar and gzip


lab 39 - tar and gzip


In lab 38 I looked at file formats I could use to store a document repository. I kind of decided I needed an archive file of individually gzipped files. After a little further digging I find that the gzip file format (rfc) supports multiple gzip files concatenated together as a valid gzip file. A quick test on unix shows this to be true.

% cat file1 file2 | wc
   2   8   35
% gzip < file1 > t1.gz
% gzip < file2 >> t1.gz
% gunzip < t1.gz |wc
   2   8   35

But the same test on Inferno did not work. After a little hacking on /appl/lib/inflate.b I got it to work, although I'm not sure I haven't broken something else in doing so. So beware.

Appending to a gzip file is a nice feature. What about puttar? Can I append to a tar file?

% puttar file1 > t1.tar
% puttar file2 >> t1.tar
% lstar < t1.tar
file1 1123551937 15 0

No. It stops reading after the first file. I looked at the code /appl/cmd/puttar.b and find it outputs zeroed blocks as a sort of null terminator for the file. I'm not sure if that's required to be a valid tar format file. The Inferno commands that read tar files don't seem to care since EOF works just as well. So I edited the file to not output zeroed blocks, and I renamed the command to putwar so not to confuse myself. Now I can append to a tar (war) file. What's more, I can put the gzip and tar together.

% putwar file1 |gzip > t1.tgz
% putwar file2 |gzip >> t1.tgz
% gunzip < t1.tgz |lstar
file1 1123553153 15 0
file2 1123553156 20 0

I'll resurect gettarentry from last lab so I can apply a command to each file

% gunzip < t1.tgz |gettarentry {echo $file; wc}
   1   4   15
   1   4   20

This is very close to what I want. I can process the whole archive in a stream, it is compressed, and if I know the offsets of each file I can jump directly to it and start the stream from there.

The remaining problems are that I don't know what meta information to store with the file, so I'm going with tar's information by default. Also, the tar format isn't handled by the sh-alphabet, which is a pity. But that doesn't matter because now I've got something concrete to play with which is good enough.

Time to really get processing some data.



Sunday, August 07, 2005

lab 38 - Geryon's data sets


lab 38 - Geryon's data sets


I need a large data set to work with so I can try out more ideas using Geyron. I want to use real data; one that can not be analyzed trivially using, say, a relational database.

Examples I considered,

  • crawl the web - a web page repository
  • an rss feed repository
  • web server query logs
  • click logs for a site
  • aggregate data input by users
  • system health records
  • sousveillance logs

Some of these are more difficult to collect than others. Some may contain greater possibility for surprises, and the surprises are what I hope to get by working with real data. Also, a data set where others can collect and duplicate my results would help to keep the conversation concrete.

But I need something right now, and there are two possibilites at hand. I've got the data set from the MIT Reality Mining project. This is about 200+MB uncompressed. This is big enough to test out some tools. But for this size data, Geyron is not likely to offer anything that can't be done in a relational database. However, in principle this data set could grow very large, and developing grid tools to process it might come in handy. For something larger I could use the Internet Archive's crawler to collect around 1 to 50GB of web pages. I'll go with the first option until I get a better idea of what I'm doing.

Before even collecting the data, I need to consider how to organize it. How it will be stored and processed. What file formats to use, etc. So, what are the properties of a large data set? Here's what Rob Pike et al say in their Sawzall paper about Google's repository, "A document repository holding a few billion HTML pages might be stored as several thousand files [each about 1GB in size] each storing a million or so documents of a few kilobytes each, compressed." A 1GB "file" is divided into 64MB chunks which is distributed among many disks, and each chunk has 3 copies. Here's an image of the repository format from the Backrub paper.

I'll try and put that in context of Inferno. A close analog seems to be a tar file of gzipped files. I'll make it easier on myself just for the moment and turn that around into a gzipped tar file. How would I process a .tgz file in one pass?

I wrote a shell filter gettarentry,

gunzip < file.tgz |gettarentry  { getlines {}  | etc. } 

where the command block parameter is applied to each file.

Gettarentry spawns a shell process for each file and writes the data down a pipe between the two. The environment variable file is set in the context for each command block.

After playing with this and thinking that this should be a shell builtin along the lines of sh's alphabet, I find it already is!

The fs command has bundle and unbundle, an equivalent to tar.

% fs bundle in | gzip > fs.bundle.gz

I hadn't considered the sh-alphabet yet in the context of Geryon. This simple use makes it already worthwhile, but there's much more to it. The fs command is a powerful tree walker, with gates and pattern matching and the ability to apply a command block to each file. The commands below show a simple comparison of fs walking the same file hierachy, one bundled and the other already extracted onto a kfs disk.

% time sh -c {gunzip < fs.bundle.gz | fs pipe -1 @{wc} {unbundle -}}
  22191  102136  698644
0l 4.281r 4.281t

% time sh -c { fs pipe -1 @{wc} {walk in}}
  22191  102136  698644
0l 7.188r 7.188t

So, I'm on the right track. This is way more powerful than my simple gettarentry. And it fits well witin the Geryon framework,

rcpu {gunzip < /n/d?.nb/fs.bundle.gz | fs pipe -1 @{wc} {unbundle -}} d0.nb ...

To read one small file within the bundle is not very efficient. But reading and processing the whole fs is faster if bundled and gzipped. Time improvement is gained by considerably less disk reads (1/3), and considerably less interaction with the fs for walking the hierarchy.

This format does not allow me to jump directly to a file within the archive. But if I switch back to the original suggestion, an archive of gzipped files, I get direct access and a few other things. It should be straight forward to append to the archive and know the total size of the archive as I add files.

I'll need to write another module for sh-alphabet to directly handle a repository format where each file is individually gzipped. But the fs/alphabet framework seems the way to go.

Another problem to handle is splitting the repository file among many kfs disks. If each 64MB chunk can stand alone as a bundle file, I could consider a 1GB file as just the concatenation of all the archive chunks. [1] It shouldn't matter in what order we process the chunks. If I build an index of each files location within the repository I need to track the chunk, or keep track of the order of chunks and just keep a single offset within the logical 1GB repository file.

The Internet Archive's crawler, Heritrix, stores web pages in the ARC file format. I could add this format to sh-alphabet so it can be processed by fs. The crawler splits the archive into 100MB files. So by using this I've already got a lot going for me.


The files for this lab include the gettarentry command and a command to extract the data from the Reality Mining MySQL dump into a format better suited to Inferno. caerwyn.com/lab/38


[1] Because I am layering file systems my terminology is getting confusing. I have chunks and files and disks at several layers.

Monday, July 18, 2005

lab 37 - Geryon's mapreduce


lab 37 - Geryon's mapreduce


I have a registry of kfs disks and cpus in my cluster, and the registry is mounted on every node. Upon this infrastructure I've written some shell scripts to simulate a MapReduce function.

I want to quantize and bound the data and the computation. To that end, a process works on only one 64MB disk for input, and may write to another disk for output. The spliting of the data is also crucial to the parallelization. Each input disk can be handled concurrently on any cpu node in the cluster. The namespace for a process is built from finding resources in the registry.

Because there are several cpus available to run a command block, I choose the cpu in round robin using a shell function to populate a list of the available cpu nodes, and take the head of the list with each subsequent call until the list is empty. [1]

subfn nextcpu {
  if {~ $#cpulist 0} {
    cpulist=`{ndb/regquery -n svc rstyx}
  result = ${hd $cpulist}
  cpulist = ${tl $cpulist}

I do a similar thing for finding a disk local to the cpu for output files. Finddisk takes the host name and returns the head of the list of available disks on that host.

Together these two functions are pretty much all the intelligence in selecting nodes for work and allocating disks. Obviously, more could be done.

Now for the definition of rcpu. Using the above functions, it picks the next cpu and output disk. Then it constructs a command block to send to that node. The command block does some initialization such as mounting the disks and running a gridrc file from the users lib directory. As part of the rstyx protocol the client device exports its namespace which is mounted at /n/client; this provides an easy way of distributing shell and dis code to every node. The gridrc file includes the shell function definition for rmnt. [lab 36] The constructed block contains the command block passed as argument to the function. It runs in the background and returns the name of the cpu disk it allocated for output.

fn rcpu {
 disk:=${tl $*}
 (net host port) := ${split '!' $cpu}
 disk = ${finddisk $host} $disk
 s=${parse '{cpudisk=/n/' ^ ${hd $disk} ^'
 run /n/client/usr/caerwyn/lib/gridrc
 rmnt ' ^ $"disk ^ ' 
 ' ^ $cmd ^ '}'}
 cpu $cpu sh -c $s &
 rpid=$apid $rpid
 echo ${hd $disk}

From this building block I construct the mapreduce (see the file gridlib for its definition). Just to remind you of how it is invoked,

% (mapreduce {idx /n/d?.mbox/*/*/*} {agg} 
   d0.mbox d1.mbox d2.mbox d3.mbox)

This example command counts the word frequency of all files on all mbox disks. An mbox disk is a kfs filesystem containing one text-only mail message per file. The example map command block uses shell pattern matching to traverse the disk and read every file. It writes to standard output a key-value pair, which in this case is a word and the value "1".

The reduce block is a filter that reads key-value lines from stdin and writes key-values to stdout. In this case, agg sums the value field for each distinct key then writes the word and the total word count.

Mapreduce interposes between the map and reduce an intermediate command that hashes the keys into a number of partitions, and writes the key-value to a partition file on the local cpu disk. Mapreduce then needs to concatenate all the intermediate partition files for one partition into a single sorted partition for input into the reduce block. The reduce command block then has all the values for a key as a contiguous stream of lines.

Mapreduce runs the map command block once for each disk passed as argument concurrently. It waits for the map workers to finish then runs the reduce block for each partition concurrently. The number of partitions is hardcoded at the moment, but should be configurable. The result is a list of disks on which the result partition files are stored.

Well that's it. Almost all implemented in inferno shell code. Many things are not handled such as fault tolerance. But this is a starting point for a working grid function. The next step, for me, is to collect a large sample data set to experiment with.




[1] A true closure with lexical binding would be ideal here. I really need to finish the symbolic shell language I started. Sending a closure with it's environment to a remote host is an appealing idea I need to explore further.

Wednesday, July 13, 2005

lab 36 - Geryon's registry


lab 36 - Geryon's registry


A critical piece of Geryon is the registry of disks and cpus. One of the immediate problems to deal with when setting up a grid cluster is the fact that the nodes come and go including the one holding the registry.

To deal with one aspect of this, the registry restarting, I modified grid/register and grid/reglisten commands from Inferno to monitor the availability of the registry, remount it if neccessary and re-announce the service.

I use grid/register for the disk services. Here is an example of how I create a new chunk and register a new kfs disk.

% zeros 1024 65536 > /chunks/chunk0
% (grid/register -a id chunk0 
{disk/kfs -rPW /chunks/chunk0})

I do this a number of times on each node that has spare disk capacity. A simple script registers all disks when I restart a node. All these disks will appear as services in the registry identified as tcp!hostname!port with some attributes including the chunk identifier, which should be unique for the host.

The next step is to name each disk. For this I use ndb. I add a file, /lib/ndb/cluster, to the ndb database with entries of the form

name=d0.mbox kfs 

name=d1.mbox kfs 

name=d0.replica kfs 
  master=host1!chunk0 replica

name=d1.replica kfs 
  master=host1!chunk1 replica

The first field is the disk name, which is unique for the cluster. The master is the chunk running on the host that serves kfs for this disk. The replica field identifies a backup disk. I hope in the future to make if possible to dynamically switch between master and replicas and use replicas during computation. But I'll skip it for now. I replicate disks by using Inferno's applylog and updatelog tools.

Once this is all in ndb I can run a script that will update the registry with the disk names.

fn refresh {
 names=`{ndb/query -a  kfs '' name}

 for (i in $names) {
  (host chunk) = ${split ! `{ndb/query name $i master}}
  addr = `{ndb/regquery -n name $host id $chunk}
  if {ftest -e /mnt/registry/ ^$i} {
   (echo host $host automount 1 persist 1 
     addr $addr replica 
     `{ndb/query name $i replica}> /mnt/registry/^$i)
  } {
   (echo $i host $host automount 1 persist 1
     addr $addr replica
     `{ndb/query name $i replica}> /mnt/registry/new)

This needs to be run when the ndb file or list of registered services changes. So ideally this should be automatic. I can quite easily see that happen, either by building a ndb file inside the registry and have it respond to changes, or implement an events file in the registry, and attach a process to that. This is a problem to work on later.

Once registered, these disks can be used from any node within the cluster. For example, I use a shell function to take a disk name and mount it as /n/diskname,

% fn rmnt {
 for (file in $*) {
  (disk rest ) := `{cat /mnt/registry/$file}
  while {! ~ $#rest 0} {
   (name val tail) := $rest
   if { ~ $name 'addr'} {mount -c $val /n/ ^ $disk}
   rest = $tail
% rmnt d0.mbox d1.mbox
% ls /n/d?.mbox

To register a cpu service,

% grid/reglisten -r svc rstyx 'tcp!*!0' {runas $user auxi/rstyxd&}

This will announce on a new address and use that address as the service name in the registry. We can then get a list of all addresses of cpu service

% ndb/regquery -n svc rstyx

For both grid/register and grid/reglisten, the service names are automatically removed from the registry once the process exits. All connections are authenticated. For the kfs disks, they should all use the same /adm/users file, something that should be copied onto the disk when it is initialized, so that permissions are enforced consistently across the cluster.

So far we have all the services we need announced dynamically. We have a naming scheme and the infrastructure for running code anywhere in the cluster. What remains is the shell code to tie it together to build a simple mapreduce.

I'll put instructions for setting up Geryon on my inferno wiki.



Sunday, July 10, 2005

lab 35 - Geryon, another Inferno grid


lab 35 - Geryon, another Inferno grid


Grid computing is everywhere it seems, and one of the obvious applications of Inferno is grid computing. As usual, I know little about it aside from reading a few papers to get me enthused enough to make some effort in that area. If I read all the current research I'd have little time to do the programming. So instead I'll jump in and see if I can swim. I'm trying to setup a simple grid, with tools for MapReduce functionality. I'm calling this system Geryon (because things need names).

This first stab is to see what can be done with the pieces already available in the system. And from this working prototype find out what pieces need to be written to fill in the gaps or improve it in any way.

Geryon is too large to cover in one blog entry so I'll be posting it piecemeal. I'll setup a wiki page to cover the whole setup and running of Geryon as I get further along.


I'll start with an overview of the system. I have a small number of machines running a mixed set of operating systems: Plan9, Linux, and Windows. On each machine is running one or more emu instances. One instance is running the registry. All emu instances must use that registry service, which is key to the operation of Geryon. Every emu instance must run one rstyxd service for the cpu command. Every service gets registered. So all computation is distributed by using the cpu command to send command blocks.

For storing data, I use a large number of chunks, which are files of equal size (64MB), and each file is used as the storage for kfs, which is registered as a styx service.

On only one emu instance, the master, is a ndb file that maps registered chunk services to "disk" names. This is edited manually. A disk name is of the form dn.app, where n is the number of the disk and app is an application name, e.g. d0.mbox. An application's data is assumed to be quite regular in it's layout on the disk and to span many disks.

Running a modified cpu command takes the form

% rcpu {ls /n/d0.mbox} d0.mbox

where rcpu runs cpu on a node of it's choosing, will mount the disk and run the given command block. The disk can be mounted from any node in the cluster by looking up its address in the registry. rcpu also uses the registry to find the available list of cpu nodes.

As well as there being nodes storing application data, each cpu has a set of registered disks for storing the output from commands running locally, e.g., d0.cpu ... dn.cpu. rcpu automatically mounts a disk local to the cpu node running the command block.

On top of this shell function I built a mapreduce shell function of the form,

% mapreduce {idx /n/d?.mbox/*/*/*}
        {agg} d0.mbox d1.mbox d2.mbox d3.mbox

The mapreduce command takes a map and reduce command block and a list of disks as input. Mapreduce uses rcpu to run the map command block on each disk on a different cpu (selected in round robin fashion). The mapper generates a key-value pair for all data stored in the files which it navigates on its own. The reducer command block takes as input the key-value pairs which are already sorted and performs a reduce function writing to stdout another key-value (see the MapReduce paper for a clearer explanation). The return value from mapreduce is the list of disks where all the output files are stored. Similar to the description in the MapReduce paper, an intermediate hash function is used to partition the data output from the mapper. And the data is collected and sorted for each partition before being sent to the reducer. The command lines for these are nothing more than, e.g.,

% {mapper} |intermediate 5 /n/d0.cpu/$job.inter.

where intermediate runs a hash function on input and writes to output files with a prefix given in the second argument, and

% cat /n/d?.cpu/$job.inter.0 | sort | 
     {reducer} > /n/d1.cpu/$job.partion.0

Notice that for the reducer all the cpu disks containing intermediate files are mounted. So a cat across a number of disks is performed as

% rcpu {cat /n/*/filename} d0.cpu d1.cpu d2.cpu d3.cpu 

I'll describe the shell code in more detail next post, and a link to a web page describing the whole setup.

Saturday, July 09, 2005

usual disclaimer

I do not know what I am doing. The point about writing the software is to learn about a problem. This means the software is not likely to be of practical use, though it maybe, I don't know.

Why don't I go and read the research papers on the subject? Reading one or two is fine, and I do, and they help. But reading too much means all I'm doing is reading and not coding. Hands on experience is learning, maybe at a slower rate than reading, but at a much deeper level. The fun is in the coding. Coding generates ideas. After coding, reading the research papers becomes much more valuable, because I now have real experience to compare against.

I'm doing it for fun. This should be called a fun lab or something. It is a lab for doing experiments and writing about them, or in the words of Homer Simpson, "It's just a bunch of stuff that happens."

Friday, July 01, 2005

lab 34 - lexis: semantic binary model implementation


lab 34 - lexis: semantic binary model implementation


The code linked to below is a database implementation based on the paper, A File Structure for Semantic Databases by N. Rishe. The paper is brief and required reading for the rest of this lab.

The application is in three modules: the Btree, the SBM abstraction called lexis, and the query module.

The Btree implementation stores key-value pairs. This Btree is a large improvement over my previous attempt, tickfs, in that a prefix length is included for each entry which improves storage efficiency. Binary search is used when searching within a block making it faster , especially considering the increase in the number of entries per block. The caching and locking mechanism have also changed, and are described in earlier labs.

Lexis is the SBM abstraction layer that sits on top of the Btree. It does not use the value field of the Btree as all information is stored in the key. The abstractions exported are Facts, which include Categories, Relations, and Attributes. The last is a small difference to the Rishe model. Attributes are relations of abstract objects to concrete objects (array of byte, which is uninterpreted by lexis). Relations are between abstract objects only. Each fact is stored in normal and inverted form as described in the paper.

Finally, there is the query module. All the query abstractions from the paper are implemented. However, the approach used for combining operators to build a complex query expression is based on data streams and communicating processes. The idea is described in D. McIlroy's, Squinting the Power Series. See also lab 31. Here's an example of inserting data to lexis and querying it.

% cd lab/34
% mkdir /dis/folkfs
% mk install
% mkdir /lib/lexis
% > /lib/lexis/index.bt       # the lockfile
% > index.bt
% folkfs/tok -i index.bt *.[bm]      # index all the source files

Now query for the symbol schema that is on the right side of the relation Hit,

% folkfs/get -q 'Hit schema ?Ra' index.bt

The query string is tokenized. Each token is either an operator (a?, aRy, aR?, ?Ra, a??, aC), an abstract object, or a concrete object. Each operator has the function prototype:

op(U: chan of list of ref Dat): chan of list of ref Dat;

The operator will create a new channel and spawn another process that receives from U and sends a result down the created channel. So in functional form it looks like:

?Ra(push('fact', push('Hit', startchan)))

In effect, for each token we get a spawned process that moves in lock step with the other processes receiving from an input channel and sending to an output channel.

An operator such as ?Ra will pop the Relation and abstract object symbols from the stack, search the Btree for all matching Facts and push the abstract or concrete object back onto the stack. This happens for each input from the channel so the channel is a stream of stacks.

This is a starting point for further experimentation with the data streams style of programming. Also, lexis will now serve as a database backend for some future file systems for web services.



Friday, June 24, 2005


Here's a download of source for all lab sessions. I'll try and keep it up to date as I post new source. I also want to update the distribution from the downloads page to contain the more robust, bug fixed, and useful code (the code I actually use). The labs download is more flakey, experimental code that is not updated with bug fixes and is often left sitting around for a long time.

lab 33 - profile lockfs


lab 33 - profile lockfs


Last lab I talked up the benefits of lockfs. I started using it in my program in place of channel locks and profiled the code using prof(1) to see it's impact on performance. The results were a little disappointing but not really suprising.

One of the test programs I'm using while developing btree application is a simple indexer. It reads a set of text files, tokenizes each file into a set of Terms, then in the database creates a document id for each document and a termid for each distinct Term and adds to the database a docid-Hit-termid relation. I then run this on a few megabytes of text data; such as all email messages for a year,

% prof -n folkfs/tok -i index.bt /n/mbox/2001/*/*

I've frequently used prof while developing the btree. Locking always presents a bottleneck. When I had the locks based on channels and a cache that also behaved as a lock, both took up large percentage of samples when profiling lines of code.

The test I did most recently was a single threaded application using lockfs and compared it with the same program without the lockfs. I obviously expected some performance loss when using lockfs but the results showed that even with no blocking for other processes the time it took to interact with lockfs completely dominated the runtime of the program. 54% of samples occurred during the statement

lock = nil;

And another 33.3% on runtime spent in the lockfs module. The total runtime with lockfs was 32 seconds, and without lockfs 11 seconds.

Even with these results, I will not discard lockfs. The benefits of a distributed, garbage collected lock make it still worth keeping. But I will selectively turn off locking when doing a bulk load of an application. A danger I noticed is that the cost of obtaining and releasing a lock started me thinking about holding locks longer to avoid as much interaction with lockfs, but this just creates a different kind of bottleneck.

The only lesson from this is always prof your code. In Inferno it's easy.

Saturday, June 18, 2005

lab 32 - lockfs


lab 32 - lockfs


I've struggled for a while trying to get multi-reader-single-writer locks (rwlock) working the way I wanted for a database I've been working on (more in a later lab). I recently realized the answer was in inferno all along.

The problem was not in implementing a rwlock, which i had done for tickfs using channels and it worked fine, the problem came when I needed to kill a process that might be holding a lock.

A lock based on using channels usually means that a spawned process is waiting for an event on a channel either to obtain or free the lock. However, if the process that holds a lock is killed it doesn't get to send the unlock event to the lock manager process.

You might think, just don't kill the processes that might hold locks. But killing processes is sometimes hard to avoid. For example, when a Flush message arrives, or to cancel a query that launched multiple processes. Programs that depend on locking based merely on channels are quite fragile, for example, tickfs.

While implementing the wikifs (lab 30) I noted the lovely property of file locks. When the process died, the file descriptor was garbage collected and the file Clunked, the lock automatically freed. The file lock in wikifs was a simple exclusive lock, using the file system properties DMEXCL to manage the exclusion (using kfs in this case).

What I wanted was a file system that served a file representing the rwlock. It already existed in lockfs. I think what had confused me originally about this fs is that I believed it had to be layered on top of the actual data file I was going to work with. That wouldn't have worked for me so I didn't experiment with lockfs. What I realized this week was that I could have lockfs control access to an empty file, and then I'd get all the benefits of file locks, and more. The rlock function became simply

lock := Sys->open(LOCKFILE, Sys->OREAD);

and wlock became

lock := Sys->open(LOCKFILE, Sys->ORDWR);

and freeing a lock was nothing more than,

lock = nil;

which would happen anyway when the function or process exited. Brilliant!

The extra benefit is that lockfs can serve it's namespace and act as a lock for a cluster of machines.

Switching to using lockfs has considerably improved my database application. The example of tickfs shows that Styx's Flush message isn't handled properly because of the problems of killing processes that might hold locks. Lockfs makes the application more robust. Also, more than one application can now access the datafile as long as each respects the locks served by lockfs. These applications do not even need to be on the same machine. By combining using lockfs with the kind of lockless cache I discuss in lab 30 I can completely avoid locking based on channels that caused me headaches when I needed to kill groups of processes that held locks.

Tuesday, June 07, 2005

lab 31 - accumulator generator in limbo


lab 31 - accumulator generator in limbo


In lab 25 I tried to tackle the problem of writing an accumulator generator in the languages available on inferno. I blithely assumed it wasn't possible in Inferno. I'm happy to say I was wrong. Roger Peppe emailed me the answers, so I'm just reporting what he gave me. But it was very enlightening, a big Aha!, and made me view limbo in a different way.

From rog: "funny thing is, it *is* possible in limbo, if you take a somewhat more liberal definition of the term "function". recall that a limbo channel can act like a function: e.g.

       c: chan of (int, chan of string);

"can represent a function that takes an int and returns a string (through the reply channel).

       c <-= (99, reply := chan of string);
       sys->print("result is %s\n", <-reply);

"i've attached a Limbo version of the accumulator-generator that uses this kind of thing. i'll leave it to you to decide if this fulfils Greenspun's rule or not!"

I said it doesn't strictly pass the test because you are passing an integer not a number. Rog replied: "ahhh but it can if i want it to! (i really like inferno's parametric types... note the accgen and acc functions could be in an external module)."

Around the same time as Rog sent me this I was also reading Structure and Interpretation of Computer Programs, in particular the sections on delayed (or lazy) lists. The connection of lazy lists to functional programming and to channels in inferno was made complete by Doug Mcilroy's paper, Squinting the Power Series. I ported the code, original in newsqueak, to inferno, to absorb the lessons from this. I recommend all inferno programmers go read this paper and study the code.

Where from here? I tried to apply what I'd learned. I created a tool for querying a little database. The query is made by chaining processes in a similar manner as the power series code. I'll be posting this code at a later time, as I hope to incorporate it into the folkonomy fs.

Thursday, May 19, 2005

inferno wiki

i have setup a demo inferno wiki for wikifs on inferno.

lab 30 - wikifs


lab 30 - wikifs for inferno


I ported plan 9's wikifs to inferno. There were various aspects of the application I wanted to learn about--such as, the cache, the transforms from wiki syntax to html, the method for using html templates, and the details of creating the namespace--that it seemed worthwhile to write it in limbo to absorb it all. I also had a hunch I'd be borrowing a lot of the code for my next file system.

The differences between the original and the port are small. The one significant difference is the approach to locking data structures.

The original uses locks for the Map and the Cache memory as well as file locks for accessing disk.

For the port I attempted to avoid the memory locks altogether. One reason being that when the Flush message is received I attempt to kill the proc processing the request, but that might leave stray locks. The file locks get freed when the file references are garbage collected.

To eliminate locks, I take advantage of limbo's garbage collection and try to use immutable types, or pass-by-value semantics.

For example, the Map is stored in a global variable. When the Map is read in from a file a new structure is created locally then assigned to that variable. The map is not changed (except for timestamps) after that and always remains self consistent.

The find routine for the map makes a local copy of the reference to the map and uses that safe knowing that the reference will stay unchanged for the duration of it's use. All memory will be collected if in the mean time a new map has been loaded.

With the cache I use a last-in-wins policy. This saves us the trouble of worrying about locks at the cost of extra reads of the disk file because of data added to the cache but lost because of overwriting the global cache references. Once data is cached is is unchanged (except for timestamps) so no locking is required once a function has a local reference to the cache data.

Here's an example of code from the cache handling. I create a new Wcache object, c, and I want to add it to the cache. The cache is an

       array of list of ref Wcache 

so when I add an item I create a new list and assign it to an array location, overwriting whatever list reference may have been there. It will be garbage collected if no other code references it. Also, the cache needs to be bounded, so I set a max length for every list. I remove items from the local copy of the list until there's room to add the new entry. Here tab is the global array.

      h := n%Nhash;
      ol := tab[h];
      c := ref Wcache(0, 0, 0, 0, nil, nil, nil, nil);
      # ... c is properly initialized before added to the list
      while(len ol >= Mcache){
          evict := -1;
          t := Sys->Maxint;
          for(l := ol; l != nil; l = tl l){
              if((hd l).use < t){
                  t = (hd l).use;
                  evict = (hd l).n;
          l = nil;
          for(;ol != nil; ol = tl ol)
              if((hd ol).n != evict)
                  l = hd ol :: l;
          ol = l;
      # last in wins!
      tab[h] = c :: ol;

Because limbo lists are read only we don't need to worry that the references in the list change (although the referenced objects might). We must guard against changing the referenced objects, except only trivially for the timestamps, and treat them as read only.

Not having to worry about locks does simplify the code. Enough that I'd look for opportunities to eliminate locks like this style of programming in the future.


lookup.b testwrite.b wiki.b wiki.m wiki2html.b wiki2text.b wikifs.b wikipost.b

Sunday, May 08, 2005

lab 29 - embedded graphs


lab 29 - embedded graphs


Taking inspiration from Mathematica's notebook user interface, I tried implementing graphs embedded within the shell window, wm/sh. The idea being that a scrolling page of graphs, text and images is easier to manage than many overlapping windows. I feel this especially while I'm iteratively running commands, generating data and viewing graphs.

The shell window provides a history of activity. By including other windows in that history I can look back at the graphs I generated, what data generated them, what command was used to generate the data, and so on. With overlapping windows this context can be lost.

Wm/sh used in this way now behaves as a rudimentary window manager. It is easy to imagine ways in which it becomes inconvient to use. Such as interacting with two windows separated by a large block of text. But it still might add a useful alternative to the wm/wm and acme window manager styles.


The graphs are Tk windows embedded within the text of the shell window. The graph can respond to events, and in this case clicking inside the graph updates the displayed coordinates.

I didn't know how to do any of this, or even that Tk made it possible, until I looked at the editor wm/brutus. Brutus seems to be a forgotten application within Inferno and not very stable. It supports embedding of images, tables, and excerpts of code within the text and simple formatting. It is still useful to trawl through such programs for snippets of code. Finding the code to embed Tk windows within text was enlightening.

I used the brutus extension interface brutusext.m to implement the graph module. The graph module itself was adapted from /appl/math/gr.b.

The shell already comes with a file system inteface /chan/shctl to communicate with external programs. I added to wm/sh two commands to be recognized by this file.

          % echo graph filename > /chan/shctl

where filename contains a set of coordinates. Wm/sh displays a graph within the shell window. embedgraph

I also added a command to display images. For example,

          % echo bitmap /icons/coin.bit > /chan/shctl

To clear memory from all the Tk objects created,

          % echo clear >/chan/shctl


By adding this feature many other possibilities and questions quickly arise. How can we collapse individual embedded windows? Can we cut and paste the embeds? What about saving the whole contents of the shell window, including embeds, and restoring them later for browsing. Wm/brutus again points the way to doing some of this, by creating a simple markup using SGML or sexprs to store the contents as an expression.


graph.b sh.b

Sunday, May 01, 2005


I took a break from this blog and inferno coding for March and April.

During that time I've been reading about web services, deconstructing Google's offerings and AJAX, and most importantly reading about the REST architectural style for the web.

I like the approach taken by the Plan9 wikifs to offer web pages by first writing a filesystem, then just mount that into the httpd namespace.

I'd like to experiment with that approach further to implement RESTful services like del.icio.us.

I've been porting wikifs to Inferno to learn more how it's done. I'll post that soon.

I'd like to update the Inferno httpd to support HTTP 1.1 and basic authentication. That shouldn't be hard considering Plan9's supports this and the code is very similar.

Wednesday, February 16, 2005

lab 28 - side effects


lab 28 - side effects


I was planning on just implementing side effects, but it turned out to be a little more complicated than I thought, and then I got sidetracked into changing the evaluator to handle more of the shell semantics.

The file lisp7.b implements setq but it is limited in that in cannot change the type of the value referenced by the symbol.

          % eval {setq a `{b c d}}
          (b c d)
          % eval {setq a `b}
          error at type mismatch
          (b c d)

Fixing this requires me re-implementing the sepxrs library to remove pick from the ADT. I'll follow up on this later.

Mixing shell and lisp syntax

The final parser in the Sussman and Steele paper The Art of the Interpreter puts dynamic scoped variables back into the evaluator. It maintains two environments, one for lexical and one for dynamic binding.

Instead of having a different dynamic environment using sexprs, I used the shell Context for the dynamic state. I also kept the shell syntax for dereferencing a dynamic variable. These and the rest of the features in this post are in lisp8.b.

          % FullForm {x := 1; echo $x; echo x}
          (Block Seq (LocalAssign x "1") (echo (Var x)) (echo x))
          % eval {x := 1; echo $x; echo x}

The lisp evaluator in lisp8.b has some significant changes over the previous ones. I've gone back to treating shell syntax more like normal shell. Semi-colon is transformed to Seq and dollar to Var. Seq is equivalent to progn in scheme. Var dereferences a variable in the dynamic environment. Block tags a braced block which essentially quotes the whole block until explicitly eval'd.

The evaluation for values is changed. If the variable is unbound, then it evaluates to itself, like the third reference to x above.

It looks up shell commands for procedure calls. So the echo command works. We also treat normal parentheses as nested list delimiters, and use applicative order for their evaluation. This essentially does away with the ${} syntax and substitution builtins.

          % eval {x := 1 2 3 4; echo (car $x)}

The language is beginning to get interesting. It looks like we have a lisp interpreter built into the handling of lists. But in fact the whole expression is converted to lisp form and evaluated. This means we can symbolically manipulate the shell commands.

          % eval {car {echo this}}

The tricky thing now is to handle more of the shell semantics including globbing, file redirection and pipes.

One assumes we will get a rather strange language at the end of it. Though I hope it will look a lot like shell with the full power of a symbolic programming language behind it.

When calling external commands, such as echo, we need to catch the case where the command doesn't exist. I want to return a form that doesn't evaluate to anything as itself.

          % eval {asdf a b}
          sh: asdf: './asdf' file does not exist
          (asdf a b)

Currently, shell prints the diagnostic that the command does not exist. I will later remove that when I move the language from a builtin to the main shell evaluator. The principle is one borrowed from computer algebra systems: Symbols evaluate to themselves if there is no other transformation.

I added support for globbing. Globbing is somewhat like macros, we are rewriting the tree before the evaluator sees it.

          % eval {echo l*}
          (lisp8.sbl lisp8.dis lisp8.b lisp7.sbl lisp7.dis lisp7.b)

I altered the FullForm for redirections and pipes but stopped short of actually implementing it. Because of lot of the code to do that is in /appl/cmd/sh/sh.y and I thought it'd be easier to move the evaluator code back into sh.y and build a new shell. I will replace the walk function in that file with eval.

Redirs are like property lists that carry with a command. The apply function needs to know about Redir. They evaluate to themselves, but when shell actually executes a command it pulls out the redirs from the expression.

          % FullForm {echo > t < x}
          (Block echo (Redir W t) (Redir R x))


I like inferno shell. It is powerful. It can be made more powerful by making it a symbolic language. I don't lose any of its current power, in particular its expressiveness for building pipelines and working with files. But I add something more. The ability for it to see itself and manipulate it's own tree of code and data.

It is useful to compare this with es shell. They took a similar idea, putting a lisp evaluator into the shell, but didn't take it to the extreme of what lisp can do. It doesn't appear the language could symbolically manipulate itself.


I will replace the shell evaluator with the one I have written so far and so create a prototype for new language. I can then implement the backquote, redirection and pipes.

Monday, January 31, 2005

lab 27 - sh sexpr


lab 27 - sh sexpr


The implementation of the lisp dialect in lab 26 had some problems. For each car or cdr operator I recursively descended the syntax tree rather than holding the lists in memory in a more convenient fomat and simply taking the head or tail of the list.

I did not want to build further on this implementation. Instead I wanted to rewrite eval to use the sexprs(2) module and convert the sh's syntax tree to an s-expression.

Converting from the syntax tree to an s-expression is pretty easy to do, though it took me a while to remember the method. The first file lisp1.b is a brute force implementation, which of course once I got working I remembered the method I should of used.

          % load ./lisp1
          % eval {a {b} c}
          (a (b) c)
          % unload ./lisp1

The second file lisp2.b maintains a global stack of lists of sexprs. As it recursively descends the syntax tree it pushes expressions onto the list at the top of the stack. If it descends into a new block, it pushes a new list.

Next lisp3.b handles all the syntax tree's node types. The result is an sexpr which preserves all information from the shell's abstract syntax tree.

          % load ./lisp3
          % eval {echo `{a b c} &}
          ((Nowait echo (Quote (a b c))))
          % eval { echo "{ (a (nested (list)))}}
          (echo (SquashQuote ((List a (List nested (List list))))))
          % eval {a 
          (Seq (a) (Seq (sequence) (Seq (of) (cmds))))
          % unload ./lisp3

The sequences should really be flattened to (Seq (a) (sequence) (of) (cmds)) But since I wanted to treat newlines not as sequences of commands but as lists I didn't bother.

The point here is that we represent the sh's abstract syntax tree as a symbolic expression ready for manipulation and evaluation. A new symbolic expression could be composed and converted back to the sh's AST for evaluation by the shell. The above builtin should not really be called eval since I am not evaluating the expression. I am just converting from input form into s-expression form.

In lisp4.b I implemented the special forms, cons, car,cdr, and atom, which are all trivial to implement given the sexpr data structure. I also added a builtin FullForm to print an shell expression converted to an s-expression, and eval now evaluates the expression using lisp semantics.

          % load ./lisp4.dis
          % FullForm { a b c {d} e {{f}}}
          (a b c (d) e ((f)))
          % FullForm {echo (a b c) `{cat t1}}
          (echo (List a b c) (quote (cat t1)))
          % eval {cdr `{a b c}}
          (b c)

In lisp5.b I added the define and lambda expressions bringing me back to the point I was at in lab 26. I have followed the evolution of eval described in the Sussman and Steele paper, The Art of the Interpreter. The environment for the eval builtin is stored as an sexpr and no longer uses the shell's Context.

The scope in lisp5.b is dynamic. The next step was to add lexical scope as is done in lisp6.b

          % eval {define {subst x y z} 
           {cond {{atom z} 
               {cond {{eq z y} x} 
                 {`t z}}}
            {`t {cons {subst x y {car z}}
                   {subst x y {cdr z}}}}}}
          % eval {subst `m `b `{a b { a b c} d} }
          (a m (a m c) d)


There are several steps to take including adding side effects, macros, and a method for calling builtins. Only side effects are still needed to implement the accumulator generator and some handling for math expressions.

This is still just for experimentation. It may be valuable in the long run to implement a real scheme compiler for dis.

Part of the experimentation is to try multiple evaluators, one of them is a lisp dialect, another is sh. I should try transforming the s-expression back to the sh syntax tree for evaluation or display to the user.

The principle being that I can take a parse tree, transform to FullForm (s-expressions), evaluate using a chosen evaluator, then transfrom to another parse tree for display. (I am not going to start using XML/XSL in these labs.)

I can experiment with a variety of input forms: man(6) macros, popi input form, Mathematica syntax. I can treat the s-expression as the standard form for all expressions within inferno and establish a standard semantics for this form.

The are other possible eval semantics I should look at: Transform by applying rules until the expression reaches a fixed point (as in Mathematica); All unbound variable names and functions to remain as symbols (as in other Computer Algebra Systems like Maxima).Seeessaysby Fateman on different symbolic evaluators.

Long term, I want a powerful, expressive, succinct language that can be embedded within a narrative and edited, and evaluated within the document by the editor (acme). If I can plug new languages into the inferno shell it may make a good vehicle for a range of expressive languages that could be included within an active essay.

Monday, January 10, 2005

lab 26 - greenspun's rule


lab 26 - greenspun's rule


Let me state up front, I do not know much about lisp. I've always been aware of it but never saw it's benefits so never used it. However, recently I've begun to read more about it, particularly the essays by Paul Graham, and his book On Lisp (which I'm still reading, on and off). I went a bit further and read Steele and Sussman's paper The Art of the Interpreter. Also, as a symptom of my Mathematica envy I became curious about symbolic programming languages, computer algebra systems, and how these are based on lisp dialects. I started playing with Maxima and yacas and also Mockmma

And so I think I finally got what lisp was about. (I still have much more reading to do. It's funny how a large and rich subculture within computer science suddenly opens up to you when you read about its core language. I have to ask myself, How could I not have noticed this for so long? The same surprise occurred to me when reading about smalltalk. It all makes me feel quite stupid and parochial.)

As an exercise in understanding lisp better I turned Inferno shell into a lisp dialect. I wrote a builtin module that provides an eval command and the seven primitive operators: quote, atom, car, cdr, eq, cons and cond.

I followed approximately the code in Paul Graham's paper The Roots of Lisp, and the interpreters described in the Steele and Sussman paper.

     % load ./lisp0.dis
     % eval {atom `a}
     % eval {atom `{a b c}}
     % eval {car `{a b c}}

Functions can be defined by assigning to a variable in shell. Scope within eval is dynamic.

     % subst={{x y z} 
      {cond {{atom z} 
          {cond {{eq z y} x} 
            {`t z}}}
       {`t {cons {subst x y {car z}}
              {subst x y {cdr z}}}}}}

     % eval {subst `m `b `{a b { a b c} d} }
     {a m {a m c} d}

The backquote is used as an abbreviation for the quote primitive.

In the builtin I hijack the parse tree and read or transform it for each primitive operator. I also use the shell Context as my symbol table for function parameters, definitions and variable values.


The attached file is the first of a few versions I'll provide as I add more features as described in Steele and Sussman, in particular lexical binding.

I wonder whether there is more value to having a dialect of lisp within the shell or whether a proper lisp implementation like scheme should be linked into the inferno VM or written in limbo. The shell already has a good interface to the whole system and mixing programming paradigms may have advantages. The problem is we may realize Greenspun's rule by being ad hoc, informally specified, bug ridden and slow.



Tuesday, January 04, 2005

lab 25 - accumulator generator


lab 25 - Accumulator Generator


The problem by Paul Graham is stated here and I quote

Write a function foo that takes a number n and returns a function that takes a number i, and returns n incremented by i.

I thought I'd attempt it in sh (I'm already assuming it's not possible in limbo). Here's my first attempt, which is wrong.

     % subfn foo {
      result={i:=$1; n=`{fc $n $i +}; echo $n}
     % n := 0
     % x=${foo 3}
     % $x 1

The problem is we can't create another accumulator generator in the same scope, since it will overwrite the variable n. Sh uses dynamic variable binding instead of lexical binding, as used by lisp.

The shell paper /doc/sh.ms:/lexical.binding/ mentions the issue and proposes a workaround but it doesn't really solve the stated problem. Here's a solution that incorporates the workaround.

     % subfn foo {
      x:= '{i:=$1; $'$n'=`{fc $$'$n' $i +}; echo $$'$n'}'
      result = ${parse $"x}
     % n := 1
     % x=${foo n}
     % $x 1

Now we're passing the name of a variable rather than the number. We can now create a number of accumulator generators. However, the syntax is ugly, and it's not something we'd want to code often. This seems a pity.


The problem is intended to emphasize the features of LISP, and in particular lexical binding and higher order functions. Indirectly it illustrates Greenspun's Rule:

Any sufficiently complicated C or Fortran program contains an ad hoc, informally specified, bug-ridden, slow implementation of half of common lisp.

Put another way, the limit case for implementing the accumulator generator in sh is to turn it into a lisp dialect by, say, writing a shell builtin to evaluate a lisp type expression, or add lexical binding to the shell. It's interesting to note, from reading about lisp recently, that lisp used dynamic binding until the introduction of scheme during the 1970s, and all dialects of lisp eventually adopted it.