Showing posts from 2005

lab 49 - wrappers

NAME lab 49 - wrappers NOTES Acme is the hub of my activity where all my tools are at hand. It is inconvenient to have to step outside it while working on a task. But it is impractical to port all the tools I need into Inferno. However, Inferno is an integrating environment. It is oftentimes easy to put a wrapper around an external tool so that it is usable within Acme. In this lab I show a few examples of putting wrappers around external tools so that they blend in to the Inferno environment and work together with Inferno's tools. The first, very simple example is awk . fn awk {os awk $*} With this definition it is only possible to use awk as a filter without naming files as command line arguments. But this is often good enough. Very often I work on my laptop which has NT as the host os. But I need to use dict and spell all the time, and my laptop doesn't have these installed. I integrate these tools into my environment by running emu on a Plan9 or Linux machine a

lab 48 - symbol

NAME lab 48 - symbol NOTES I'm clearing out the backlog and found this lab's code, which is really a continuation of earlier labs on the lispy-shell (26, 27, 28). I felt that where I was going with lab 28 wasn't working out, or was just going to end up looking like scsh (if I was lucky, and as smart as Olin Shivers). This attempt is more a mathematica-like-shell, and for a while was looking better, in that it fit in better with the existing sh, but now I've decided against it too. I'm posting it here just to record what I've done. I'll try and explain at the end why I don't like this approach anymore. In Mathematica everthing is an expression of the form f[x,y] which translates quite easily into the shell expression f x y Because of Inferno sh's command blocks, we can directly support the nesting in expressions: % Head {f x y} f % Head {f;x;y} List # a sequence of commands {f;x;y} is # represented as {List f x y} in "FullForm" And

lab 47 - self tk

NAME lab 47 - self tk NOTES I was inspired by watching a video about Sun's Self system, which also inspired Squeak's morphic user interface. The scripts in this lab were an attempt to get tk windows serving a file that described themselves, so that the file could be copied to disk, and later run to recreate the window. Here's an example. % ./text /chan/tk.239 % cp /chan/tk.239 t1 % run t1 You should now have two text windows open. Edit either one of them, then copy and run them again. Both the contents and the position of each window should be saved and restored. Right-clicking in the window brings up the titlebar, whichs pops up underneath the window because it is the last thing packed into the tk widget. Right-clicking again makes it disappear. One of the ideas being explored is that a set of widgets can be configured and placed on the screen then all the /chan/tk.* files copied to a directory that acts as a stored workspace, to be later restored with all the widg

lab 46 - vac script

NAME lab 46 - vac script NOTES This lab is an answer to Jack's question in the comments of lab 41. The putclump/getclump scripts can be made to work with blocks of data. The shell script for this lab, vac , will write a kfs file system to a venti-lite archive in blocks, taking advantage of block level redundancy. This is demonstration rather that something truly useful. It literally takes hours to store a 64MB kfs file system in venti-lite using this script. Note also, that it is trivial to implement putclump/getclump using the venti(2) module in inferno and store the blocks in a real venti archive. And this script will work the same way. I've also included a vcat script to readout the kfs filesystem in one go. I started writing a file2chan interface so I could run disk/kfs directly against the venti-lite archive, but didn't finish it. I've attached that too (clumpchan). Exercise for the reader is to finish it. FILES 2005/1207 I updated vac

lab 45 - full screen

NAME lab 45 - full screen NOTES I've been wanting to use inferno as a full-screen application on Nt. I don't know too much about the windows API but I took a peek at how Squeak does it. And that was enough to get it to work. This is a first pass at doing it. It should probably be done with parameters to emu. Ideally the inferno host window would be resizable with some way of switching to full screen and window while emu is running. Just copy the win.c file to your /emu/Nt directory and build. The -g option to emu needs to be the current settings for your monitor. The other files in this lab are the wmclient, tkclien,t and titlebar to make wm windows look more rio-like. I had done this in an earlier lab, but the wmclient code didn't work properly. The wmclient works properly now. But there's no backdrop menu for window control, which makes it a little difficult to hide and exit the windows. FILES

lab 44 - acme irc client

NAME lab 44 - acme irc client NOTES The code in this lab is a port of an irc client for the Acme editor. The original was by Russ Cox. The translation was pretty straight forward, that is without any creative input from me. While the code is usable as an IRC client there are still plenty of bugs I've left in it for the reader to explore. FILES

lab 43 - acme Wiki

NAME lab 43 - acme Wiki NOTES I ported the Acme Wiki client from Plan 9 to Inferno. At the moment (20050925) a small patch to Inferno's Acme is needed to get this to work. Make the following change (using ed) to /appl/acme/regx.b: 193c if ((nc := xgetc(a0, a1, q)) != '#' && nc != '/' && nc != '?') . Download the wiki.tgz and unpack under /acme. Then mount a wiki and start Acme. % cd /acme % gunzip < wiki.tgz |gettar % mount -9 tcp!!wiki /mnt/wiki % acme -c1 /acme/wiki/guide # inside Acme middle-click Wiki I took a lot of the code that interfaces with the Acme filesystem from the Mail command and put it into the Acmewin module. This module should be useful to other Acme clients. Once I started using this it flushed out a few bugs with my inferno wikifs port. I've updated the code under the original lab 30 with fixes. Thanks to Russ Cox for writing the original wikifs and Acme wiki client. My version

lab 42 - channels in shell

NAME lab 42 - channels in shell NOTES This is a quicky. I ripped this code off from Rob Pike's squint, an interpreter for his squeak language, a predecessor of limbo. It's an implementation of Eratosthenes's Sieve using communicating processes and channels, which I translated to Inferno shell. This was just to try out the channels in shell, since somehow I've overlooked them till now. load tk load expr fn counter { ch := $1 {i:=2; while {} {send $ch $i; i=${expr $i 1 +}} } } fn filter { prime := $1 listen := $2 s := $3 while {} { i := ${recv $listen} if {ntest ${expr $i $prime %}} { send $s $i} } } fn sieve { prime := $1 chan count c := count counter count & {while {} { p := ${recv $c} send prime $p chan newc^$p filter $p $c newc^$p & c = newc^$p }} & } chan prime sieve prime & echo ${recv prime} echo ${recv prime} echo ${recv prime} Watch your CPU meter spike when you get a hundred primes or so.

lab 41 - venti lite

NAME lab 41 - venti lite NOTES I've taken another look recently at venti and the ideas from the venti paper . Venti is a data log and index. The index maps a sha1 hash of a clump of data to the offset of the clump in the log. Clumps are appended to the log after being compressed and wrapped with some meta-data. A sequence of clumps make an arena, and all the arenas, possibly spread over several disks, make up the whole data log. There is enough information in the data log to recreate the index, if neccessary. The above is my understanding of Venti so far after reading the code and paper. There is a lot more complexity in it's implementation. There are details about the caches, the index, the compression scheme, the blocking and partitioning of disks, and so on. I will ignore these details for now. Although the whole of venti could be ported to Inferno, I want to look at it without getting bogged down in too many details too early. Reasoning about Venti in the context of I

first year

first year anniversary A year ago today I started keeping this notebook. The original idea was for it to be a social thing. Instead of keeping a private notebook of my computing experiments, I published everything in a blog to make it part of a conversation. Who were the other voices? A pattern emerged in how I went about finding ideas and exploring them. It seems I hit upon an interesting paper--often from a regular source like Google Labs, Bell Labs or the MIT PDOS group--and I'd then try and reason about the paper's subject in the context of Inferno. So the other voices I started listening to were from the authors of these great papers. Sketching out a program from something I'd read in a paper helped me to understand the paper better and generate ideas that went beyond the paper. This was not just useful, but also a lot of fun. Much as writing is more than communication but an extension of the process of thinking, programming is not merely a corporate activity,

lab 40 - distributing data

NAME lab 40 - distributing data NOTES The first problem I had to deal with once extracting the data from the reality mining data set was how to distribute it across all the disks. So this lab describes how I did it. The Geyron grid I have at home is a simulation grid. I'm just pretending that I have 16 real disks, when I actually only have two. [1] The sizes of things are also scaled down, so each disk is only 64MB instead of a more typical 40GB. I divided each disk into 1MB chunks for this simulation. On each disk I created 60 files numbered 0.ckfree ... 59.ckfree. Each empty file is a record that an available chunk is on this disk. I don't use 64 because I leave room for slop. Each chunk is going to be slightly over 1MB. I do all this easily enough using my friend, the registry: % for (i in `{ndb/regquery -n resource kfs}) { mount $i /n/remote for (j in `{seq 0 59}) { > /n/remote/$j.ckfree } } The idea behind this, as suggested in the google f

lab 39 - tar and gzip

NAME lab 39 - tar and gzip NOTES In lab 38 I looked at file formats I could use to store a document repository. I kind of decided I needed an archive file of individually gzipped files. After a little further digging I find that the gzip file format ( rfc ) supports multiple gzip files concatenated together as a valid gzip file. A quick test on unix shows this to be true. % cat file1 file2 | wc 2 8 35 % gzip < file1 > t1.gz % gzip < file2 >> t1.gz % gunzip < t1.gz |wc 2 8 35 But the same test on Inferno did not work. After a little hacking on /appl/lib/inflate.b I got it to work, although I'm not sure I haven't broken something else in doing so. So beware. Appending to a gzip file is a nice feature. What about puttar ? Can I append to a tar file? % puttar file1 > t1.tar % puttar file2 >> t1.tar % lstar < t1.tar file1 1123551937 15 0 No. It stops reading after the first file. I looked at the code /appl/cmd/puttar.b and find it

lab 38 - Geryon's data sets

NAME lab 38 - Geryon's data sets NOTES I need a large data set to work with so I can try out more ideas using Geyron. I want to use real data; one that can not be analyzed trivially using, say, a relational database. Examples I considered, crawl the web - a web page repository an rss feed repository web server query logs click logs for a site aggregate data input by users system health records sousveillance logs Some of these are more difficult to collect than others. Some may contain greater possibility for surprises, and the surprises are what I hope to get by working with real data. Also, a data set where others can collect and duplicate my results would help to keep the conversation concrete. But I need something right now, and there are two possibilites at hand. I've got the data set from the MIT Reality Mining project. This is about 200+MB uncompressed. This is big enough to test out some tools. But for this size data, Geyron is not likely to offer a

lab 37 - Geryon's mapreduce

NAME lab 37 - Geryon's mapreduce NOTES I have a registry of kfs disks and cpus in my cluster, and the registry is mounted on every node. Upon this infrastructure I've written some shell scripts to simulate a MapReduce function. I want to quantize and bound the data and the computation. To that end, a process works on only one 64MB disk for input, and may write to another disk for output. The spliting of the data is also crucial to the parallelization. Each input disk can be handled concurrently on any cpu node in the cluster. The namespace for a process is built from finding resources in the registry. Because there are several cpus available to run a command block, I choose the cpu in round robin using a shell function to populate a list of the available cpu nodes, and take the head of the list with each subsequent call until the list is empty. [1] cpulist=() subfn nextcpu { if {~ $#cpulist 0} { cpulist=`{ndb/regquery -n svc rstyx} } result = ${hd $cpulist}

lab 36 - Geryon's registry

NAME lab 36 - Geryon's registry NOTES A critical piece of Geryon is the registry of disks and cpus. One of the immediate problems to deal with when setting up a grid cluster is the fact that the nodes come and go including the one holding the registry. To deal with one aspect of this, the registry restarting, I modified grid/register and grid/reglisten commands from Inferno to monitor the availability of the registry, remount it if neccessary and re-announce the service. I use grid/register for the disk services. Here is an example of how I create a new chunk and register a new kfs disk. % zeros 1024 65536 > /chunks/chunk0 % (grid/register -a id chunk0 {disk/kfs -rPW /chunks/chunk0}) I do this a number of times on each node that has spare disk capacity. A simple script registers all disks when I restart a node. All these disks will appear as services in the registry identified as tcp!hostname!port with some attributes including the chunk identifier, which should

lab 35 - Geryon, another Inferno grid

NAME lab 35 - Geryon , another Inferno grid NOTES Grid computing is everywhere it seems, and one of the obvious applications of Inferno is grid computing. As usual, I know little about it aside from reading a few papers to get me enthused enough to make some effort in that area. If I read all the current research I'd have little time to do the programming. So instead I'll jump in and see if I can swim. I'm trying to setup a simple grid, with tools for MapReduce functionality. I'm calling this system Geryon (because things need names). This first stab is to see what can be done with the pieces already available in the system. And from this working prototype find out what pieces need to be written to fill in the gaps or improve it in any way. Geryon is too large to cover in one blog entry so I'll be posting it piecemeal. I'll setup a wiki page to cover the whole setup and running of Geryon as I get further along. OVERVIEW I'll start with an overview

usual disclaimer

I do not know what I am doing. The point about writing the software is to learn about a problem. This means the software is not likely to be of practical use, though it maybe, I don't know. Why don't I go and read the research papers on the subject? Reading one or two is fine, and I do, and they help. But reading too much means all I'm doing is reading and not coding. Hands on experience is learning, maybe at a slower rate than reading, but at a much deeper level. The fun is in the coding. Coding generates ideas. After coding, reading the research papers becomes much more valuable, because I now have real experience to compare against. I'm doing it for fun. This should be called a fun lab or something. It is a lab for doing experiments and writing about them, or in the words of Homer Simpson, "It's just a bunch of stuff that happens."

lab 34 - lexis: semantic binary model implementation

NAME lab 34 - lexis: semantic binary model implementation NOTES The code linked to below is a database implementation based on the paper, A File Structure for Semantic Databases by N. Rishe. The paper is brief and required reading for the rest of this lab. The application is in three modules: the Btree, the SBM abstraction called lexis , and the query module. The Btree implementation stores key-value pairs. This Btree is a large improvement over my previous attempt, tickfs , in that a prefix length is included for each entry which improves storage efficiency. Binary search is used when searching within a block making it faster , especially considering the increase in the number of entries per block. The caching and locking mechanism have also changed, and are described in earlier labs. Lexis is the SBM abstraction layer that sits on top of the Btree. It does not use the value field of the Btree as all information is stored in the key. The abstractions exported are Facts , wh


Here's a download of source for all lab sessions. I'll try and keep it up to date as I post new source. I also want to update the distribution from the downloads page to contain the more robust, bug fixed, and useful code (the code I actually use). The labs download is more flakey, experimental code that is not updated with bug fixes and is often left sitting around for a long time.

lab 33 - profile lockfs

NAME lab 33 - profile lockfs NOTES Last lab I talked up the benefits of lockfs. I started using it in my program in place of channel locks and profiled the code using prof (1) to see it's impact on performance. The results were a little disappointing but not really suprising. One of the test programs I'm using while developing btree application is a simple indexer. It reads a set of text files, tokenizes each file into a set of Terms, then in the database creates a document id for each document and a termid for each distinct Term and adds to the database a docid-Hit-termid relation. I then run this on a few megabytes of text data; such as all email messages for a year, % prof -n folkfs/tok -i /n/mbox/2001/*/* I've frequently used prof while developing the btree. Locking always presents a bottleneck. When I had the locks based on channels and a cache that also behaved as a lock, both took up large percentage of samples when profiling lines of code. The te

lab 32 - lockfs

NAME lab 32 - lockfs DESCRIPTION I've struggled for a while trying to get multi-reader-single-writer locks (rwlock) working the way I wanted for a database I've been working on (more in a later lab). I recently realized the answer was in inferno all along. The problem was not in implementing a rwlock, which i had done for tickfs using channels and it worked fine, the problem came when I needed to kill a process that might be holding a lock. A lock based on using channels usually means that a spawned process is waiting for an event on a channel either to obtain or free the lock. However, if the process that holds a lock is killed it doesn't get to send the unlock event to the lock manager process. You might think, just don't kill the processes that might hold locks. But killing processes is sometimes hard to avoid. For example, when a Flush message arrives, or to cancel a query that launched multiple processes. Programs that depend on locking based merely on

lab 31 - accumulator generator in limbo

NAME lab 31 - accumulator generator in limbo DESCRIPTION In lab 25 I tried to tackle the problem of writing an accumulator generator in the languages available on inferno. I blithely assumed it wasn't possible in Inferno. I'm happy to say I was wrong. Roger Peppe emailed me the answers, so I'm just reporting what he gave me. But it was very enlightening, a big Aha!, and made me view limbo in a different way. From rog: "funny thing is, it *is* possible in limbo, if you take a somewhat more liberal definition of the term "function". recall that a limbo channel can act like a function: e.g. c: chan of (int, chan of string); "can represent a function that takes an int and returns a string (through the reply channel). c <-= (99, reply := chan of string); sys->print("result is %s\n", <-reply); "i've attached a Limbo version of the accumulator-generator that uses this kind of thing. i'll leave it

inferno wiki

i have setup a demo inferno wiki for wikifs on inferno.

lab 30 - wikifs

NAME lab 30 - wikifs for inferno DESCRIPTION I ported plan 9's wikifs to inferno. There were various aspects of the application I wanted to learn about--such as, the cache, the transforms from wiki syntax to html, the method for using html templates, and the details of creating the namespace--that it seemed worthwhile to write it in limbo to absorb it all. I also had a hunch I'd be borrowing a lot of the code for my next file system. The differences between the original and the port are small. The one significant difference is the approach to locking data structures. The original uses locks for the Map and the Cache memory as well as file locks for accessing disk. For the port I attempted to avoid the memory locks altogether. One reason being that when the Flush message is received I attempt to kill the proc processing the request, but that might leave stray locks. The f

lab 29 - embedded graphs

NAME lab 29 - embedded graphs DESCRIPTION Taking inspiration from Mathematica 's notebook user interface, I tried implementing graphs embedded within the shell window, wm/sh . The idea being that a scrolling page of graphs, text and images is easier to manage than many overlapping windows. I feel this especially while I'm iteratively running commands, generating data and viewing graphs. The shell window provides a history of activity. By including other windows in that history I can look back at the graphs I generated, what data generated them, what command was used to generate the data, and so on. With overlapping windows this context can be lost. Wm/sh used in this way now behaves as a rudimentary window manager. It is easy to imagine ways in which it becomes inconvient to use. Such as interacting with two windows separated by a large block of text. But it still might ad


I took a break from this blog and inferno coding for March and April. During that time I've been reading about web services, deconstructing Google's offerings and AJAX , and most importantly reading about the REST architectural style for the web. I like the approach taken by the Plan9 wikifs to offer web pages by first writing a filesystem, then just mount that into the httpd namespace. I'd like to experiment with that approach further to implement RESTful services like . I've been porting wikifs to Inferno to learn more how it's done. I'll post that soon. I'd like to update the Inferno httpd to support HTTP 1.1 and basic authentication. That shouldn't be hard considering Plan9's supports this and the code is very similar.

lab 28 - side effects

NAME lab 28 - side effects DESCRIPTION I was planning on just implementing side effects, but it turned out to be a little more complicated than I thought, and then I got sidetracked into changing the evaluator to handle more of the shell semantics. The file lisp7.b implements setq but it is limited in that in cannot change the type of the value referenced by the symbol. % eval {setq a `{b c d}} (b c d) % eval {setq a `b} error at type mismatch (b c d) Fixing this requires me re-implementing the sepxrs library to remove pick from the ADT. I'll follow up on this later. Mixing shell and lisp syntax The final parser in the Sussman and Steele paper The Art of the Interpreter puts dynamic scoped variables back into the evaluator. It maintains two environments, one for lexical and one for dynamic binding. Instead of havi

lab 27 - sh sexpr

NAME lab 27 - sh sexpr DESCRIPTION The implementation of the lisp dialect in lab 26 had some problems. For each car or cdr operator I recursively descended the syntax tree rather than holding the lists in memory in a more convenient fomat and simply taking the head or tail of the list. I did not want to build further on this implementation. Instead I wanted to rewrite eval to use the sexprs (2) module and convert the sh's syntax tree to an s-expression. Converting from the syntax tree to an s-expression is pretty easy to do, though it took me a while to remember the method. The first file lisp1.b is a brute force implementation, which of course once I got working I remembered the method I should of used. % load ./lisp1 % eval {a {b} c} (a (b) c) % unload ./lisp1 The second file lisp2.b maintains a global stack of lists of sexp

lab 26 - greenspun's rule

NAME lab 26 - greenspun's rule DESCRIPTION Let me state up front, I do not know much about lisp. I've always been aware of it but never saw it's benefits so never used it. However, recently I've begun to read more about it, particularly the essays by Paul Graham, and his book On Lisp (which I'm still reading, on and off). I went a bit further and read Steele and Sussman's paper The Art of the Interpreter. Also, as a symptom of my Mathematica envy I became curious about symbolic programming languages, computer algebra systems, and how these are based on lisp dialects. I started playing with Maxima and yacas and also Mockmma And so I think I finally got what lisp was about. (I still have much more reading to do. It's funny how a large and rich subculture within computer science suddenly opens up to you when you read about its core language. I hav

lab 25 - accumulator generator

NAME lab 25 - Accumulator Generator DESCRIPTION The problem by Paul Graham is stated here and I quote Write a function foo that takes a number n and returns a function that takes a number i , and returns n incremented by i . I thought I'd attempt it in sh (I'm already assuming it's not possible in limbo). Here's my first attempt, which is wrong. % subfn foo { n=$1 result={i:=$1; n=`{fc $n $i +}; echo $n} } % n := 0 % x=${foo 3} % $x 1 4 The problem is we can't create another accumulator generator in the same scope, since it will overwrite the variable n. Sh uses dynamic variable binding instead of lexical binding, as used by lisp. The shell paper /doc/ mentions the issue and proposes a workaround but it doesn't really solve the stated problem. Here's a solution that incor