Saturday, April 07, 2007

lab 74 - effective inferno

NAME

lab 74 - effective inferno

NOTES

I read the Effective Java book recently. Every language needs a book that describes how to use it effectively. I wish there was a book for Effective Inferno.

Although common techniques may work in Inferno I know for sure there are some uncommon ones that may work better.

I'll try and describe at least one recipe that could be a chapter in that book.

To people who've asked me what distinguishes Inferno, I've answered that it's the concurrent language Limbo, or that it's a portable OS that runs on a VM, or that it uses the Styx protocol to create a distributed system, or that it's about software tools that work together with the Inferno shell.

These are the ingredients; on their own they are not unique to inferno. Limbo looks very like C and to a newbie it's not obvious what is special about it. There a lots of little niceties that make Limbo pleasant to use, e.g., array slicing and tuples, but these are details; they do not greatly impact an application architecture.

A lot of the Unix-like commands under /appl/cmd/ are straight ports from Plan 9. This is old school. And the fact that they're implemented in Limbo does not greatly differentiate them from their C counterparts. The special flavor of Limbo does not come out when used this way.

The OS interface also looks much like Unix or Plan 9, except for the VM piece, so what is special about that? The advantages of platform independent come out when doing distributed applications. So there must be techniques that need to be learned to do that effectively.

In Inferno it's a new world where each ingredient might be familiar to you in another context but we can cook things a little differently here.

Extension Points

For any system we need to know how we might extend it and how we combine the pieces that are already there. Inferno provides a number of interfaces at different levels of granularity where the programmer can extend the system. Lets quickly review the most common ones, available in most systems.

1. We can write libraries and extend the systems with new functions; such things as parsing file formats or protocols. We can't live without those libraries. Most of the ones in /appl/lib are of this form.

2. We can create new commands and add to our software tool set. Commands can be developed that do one thing well, and grow the system organically.

3. We can use the shell to combine commands using pipelines or to customize command interfaces or extend the system further with commands implemented in the shell.

These are the contributions from our Unix legacy. And they are great! Plan 9 followed and gave us everything-is-a-file, the 9p protocol, and private namespaces. Inferno, being a direct descendant of Plan 9, inherited all that.

4. We can build filesystems to provide new services. We can also architect a distributed system by basing applications around the Styx protocol.

We don't have to use Limbo to extend inferno in this way, we can use any language, and there are already several implementations of the styx library. The styx protocol is itself an incredibly powerful and flexible way of extending the system. It permits re-use of existing tools that operate on files.

5. We can extend the system as clients to file services. Two big examples from Inferno and Plan 9 are Acme and the Plumber. Because of the loose coupling between client and server we can extend the system while it is running. We can write new Acme plugins while we are working within Acme.

This is a great pattern for developing services that you want extended. Don't just be a client of the system extention points but create your own extension points.

You may think this is where I'd stop. This would be enough right? What a great system. Plan 9 is a lovely system for all the above reasons. But we haven't got unique to inferno yet. And there is something more lovely. Bear with me, this will take a few paragraphs.

6. Limbo supports dynamically loadable modules. We can define an extension point as a module interface that the client is expected to implement. Then we can load the client, one of possibly many implementations, as needed.

This is distinct from a module interface used by libraries where in practice there is really the one and final implementation. [1], [2].

What I mean here is that there a many clients that might do distinctly different things but are able to use a shared interface with other clients.

The simplest example is the Command interface with the shell controlling loading of modules.

Command: module {
 init: fn(ctxt: ref Draw->Context, argv: list of string);
};

This is the extension point. You implement a client that supports this interface and then the shell will call it. Implicit in the interface is the fact that you run as a separate process and you have file descriptors for stdin, stdout, and stderr.

Of course, we all know this interface because it's similar to ones used by shells on most operating systems.

Now you say, I've already covered this in point 2. But this is different. Inferno allows us to define new interfaces, that can be specific to our problem.

For example, the Inferno shell defines a more complicated interface for shell loadable modules that extend the shell and use shells syntax to for structured arguments.

But here's the kicker. Inferno's shell implements the Lispish pattern of using a standard syntax to represent data and code. Instead of parens it uses braces but other than that we have lists of nested lists.

 {f x {g y z } }

7. We can extend the system by re-using shell syntax to structure the input to our own applications

The shell parses a block, does syntax checking but doesn't eval it. It leaves it up to the command to evaluate, either builtin or external. So we can implement the shell pattern, but not use the shell builtin infrastructure. We define our own evaluator and pass it a shell block. We define, in a sense, our own shells specific to a problem domain.

The other big piece to this is that the module interfaces can include typed channels and that each run in a separate process so that the modules can communicate to each other, working together and forming the equivalent of pipelines, but with types.

I'll give a few examples to really push this point. This is not a trivial feature and it's easy to overlook.

The prime example in inferno is the fs(1) command.

Fs uses shell syntax but doesn't use shell to evaluate it. Fs is a tree walker that allows arbitrary mixing of components to operate on a tree. It's a remarkable command because it combines sh syntax reuse, with the module extension point for clients that use channels to communicate between processes. Its a lovely thing. The extension point looks like this.

Fsmodule: module {
 types: fn(): string;
 init: fn();
 run: fn(ctxt: ref Draw->Context, 
  r: ref Fslib->Report,
  opts: list of Fslib->Option, 
  args: list of ref Fslib->Value): ref Fslib->Value;
};

Here's an example of how I use fs to build the acme-sac distribution. I filter out .svn files, filter out .sbl and .dis files below /appl and object files below /sys, then copy the whole tree using a proto file to define the components to copy.

   fs write /n/d/acme  {filter  
   {and {not {match .svn}} 
   {not {match -ar '(/appl/.*\.(dis|sbl))|(/sys.*\.*(obj|a|pdb))$'}}} 
   {proto /lib/proto/full} } 

It's very lispish in that we've built a domain language, in this case for tree walking, which we can add to, a command at a time, re-using existing commands in possibly novel ways, and using shell syntax to represent the whole expression.

By combining loadable modules, sh syntax, a uniform interface, channels, processes and files, we have a unique programming environment. This is a powerful pattern that should be repeated.

This kind of programming is now quite unlike any other. It establishes a pattern for structuring applications. It leverages some of the quality ingredients inside inferno.

Another example is the sound synthesizer I've been working on. I'm consciously imitating fs here. I use a simple module extension point with processes communicating on channels, a sh syntax that combines the modules.

Clients implement an interface such as this,

 Source: type array of ref Inst;
 Sample: type chan of (array of real, chan of array of real);
 Control: type chan of (int, array of real);
 
IInstrument: module {
 init: fn(ctxt: Instrument->Context);
 synth: fn(s: Instrument->Source, 
  c: Instrument->Sample,
  ctl: Instrument->Control);
};

I defined a module called Expr that operates very similarly to sexprs(2) to handle the shell syntax. Then i can combine the modules in all sorts of ways on the shell command line:

   synth/sequencer {master  
   {fm  {waveloop sinewave} {adsr 0.01 0.21 0.3 0.08} } 
   {delay 0.085 0.4} {delay 0.185 0.2} 
   {delay 0.485 0.1} {delay 0.685 0.08}  
   {proxy {onezero} {lfo 0.18 1.0 0.0} }  } 

We're still not done. We haven't used all the ingredients. Lets throw in Styx.

Lets combine all the above, plus the fact that dis is completely platform independent so we can compile once and run anywhere that inferno emulator runs.

The example that would illustrate this is VN's grid. But I'm guessing since I haven't used it. You can also see all these ingredients combined in sh-alphabet's grid. Sh-alphabet is quintessential Inferno.

From these ideas I'm trying to implement a mapreduce framework. I will use sh syntax to form the expression dynamically, use module extensions to write new map and reduce functions, use Styx to create the mapreduce infrastructure, use dis to distribute code and have channels and files for communicating between processes.

The module extension points are as follows,

Mapper : module {
 map: fn(key, value: string, emit: chan of (string, string));
};

Reducer : module {
 reduce: fn(key: string, input: chan of string, emit: chan of string);
};

The mapreduce command is itself a file system that is exported to remote hosts so that clients read and write to it, reading instructions on what to do next, writing back status, so the master can keep track of what's going on. Command usage might be as follows,

mapreduce {reduce {map {reduce {map path mapfn} reducefn} mapfn}  reducefn}

Or using pipeline notation, similar to how sh-alphabet uses it,

mapreduce {map path mapfn | reduce reducefn | map mapfn | reduce reducefn}

Now we're talking. This is Inferno's sweet spot.

This recipe should be pushed HARD in inferno.

I'll stop here, even though I still haven't played the polymorphism card. Sh-alphabet does, but it's a complicated example to swallow.

I recommend programmers start by using fs(1). Then look at how your own programs can be re-worked to factor out modules using a module extension point that uses channels, and then use sh syntax for combining those modules.

The next step may then be to distribute those modules by building into it a styx service.

FOOTNOTES

[1] Actually, there is a slight variation in the library idea in that a library can have multiple implementations, a kind of polymorphism. The Imagefile interface is of this form. This is a slightly more sophisticated library where we load a particular library depending on the kind of operation. But a user of the system is not really expected to extend the system at these points, even though they might. (Other examples are Filter and Encoding)

[2] Another variation on the command polymorphism is that we can bind alternative implementations over our namespace. But in practice this seems to be done rarely. Binding /acme/dis/cd.dis over /dis/cd.dis is one example. Maybe this could be exploited further.

No comments: