Monday, October 25, 2004

lab 15 - cluster file


lab 15 - cluster file


I'm still curious about layering of filesystems. Ds(3) was an example I looked at before writing signalfs. Another example was cryptfs (lab 2). Ds currently resides in /os/port/devds.c. A while ago I moved it into /emu/port/devds.c but hadn't made any use of it. I have wanted to use it for a rudimentary cluster file system, so I ported ds to limbo to play with distributed files and further explore files that are built from layering of other file systems.

I translated the C code and used styxservers to manage the simple, one level namespace. Here's some testing. This is really as much as I've tested it so far.

% for (i in `{seq 0 9}) {zeros -v $i 1024 8 > /tmp/chunk ^$i}
% echo cat c1 /tmp/chunk2 /tmp/chunk3 >ds/ctl
% echo cat c0 /tmp/chunk0 /tmp/chunk1 >ds/ctl
% echo mirror m0 ds/c0 ds/c1 > ds/ctl
% cat ds/ctl
cat c1 /tmp/chunk2 /tmp/chunk3
cat c0 /tmp/chunk0 /tmp/chunk1
mirror m0 ds/c0 ds/c1
% ls -l ds
--rw-rw-rw- M 36 caerwyn caerwyn 16384 Dec 31  1969 ds/c0
--rw-rw-rw- M 36 caerwyn caerwyn 16384 Dec 31  1969 ds/c1
--rw-rw-rw- M 36 caerwyn caerwyn     0 Dec 31  1969 ds/ctl
--rw-rw-rw- M 36 caerwyn caerwyn 16384 Dec 31  1969 ds/m0
% cat ds/c0 > /tmp/t1
% cat /tmp/t1 > ds/m0
% cmp ds/c0 ds/c1

I read the googlefs paper again today. With that in mind a cluster file system could be pieced together using some inferno components.

A kfs(4), or any standard fs, represents the master namespace. All files contain only a ds configuration--the chunk IDs and partioning info. All the inferno nodes write to the master registry (4) the chunks they will serve. Ds grows files by reading chunk id's from a master process that uses the registry to allocate new chunks. A client must navigate the master namespace to the file containing the ds configuration and mount ds in it's namespace. Then it has a distributed file which communicates directly with the nodes storing the disks in the network.


Not much thought given to the deeper problems. How would multiple writers on different clients append to the same file? Here is the source for dsfs.b

Thursday, October 21, 2004

lab 14 - map reduce


lab 14 - map reduce functional grid programming


I read about Google's MapReduce from Rob Pike's interview at slashdot. At the same time I've been studying Inferno's alphabet-grid(1) and am wondering if I can implement map reduce using alphabet.

Here's an imaginary example

          % - {rng , | remote |
           mapreduce "{tq -1rm /n/tick/} "{tock } |
           /create /tmp/result

Suppose that is a log of time spent on tasks where each record is the timestamp, task and number of seconds spent on the task that instance. Rng produces 1 or more date ranges. Remote converts type /fd to an endpoint. Mapreduce will then split a date range, such as one year, into M smaller date ranges. For each subrange it calls rexec passing it the address of an available node, the subrange and map function as parameters.

The output from all the map functions is directed to R endpoints. The R parition function could be hash(key) mod R as suggested in the paper. Then mapreduce rexec's a reduce worker, which reads in all the data from the endpoint, sorts it, and for each key calls the reduce function with the key and list of values (or /fd) as parameter. In this example tock, the reduce function, sums all the time values for a task and outputs the total.

I've made the example specific to tickfs and the use of triads merely because I already have these tools and makes it easier for me to grasp. The google paper uses key, value pairs. I'm ignoring all the other factors they consider, such as fault tolerance, locality, and much else.

Here's another example. In the distribution on my homepage I include a command nsearch for searching a tickfs index. The command is given a list of keywords. Given the first keyword, which might be a date range, it builds an initial set of keys. It then partitions this set among a fixed number of threads. Each thread test the record coming in on a channel against the index and the search term given as parameter to the thread. The reduce function would be an identity function, simply passing through it's input. This is a map, filter, reduce pipeline. Alphabet seems to provide the tools to express this whole query and more on the command line, including distributing the processing among nodes.

The implementation needs somewhere to lookup the available list of nodes. Administering all the nodes would need some fancy fs that managed the status of all executing workers. I'd keep this to an absolute minimum for now.


This all sounds very promising but I don't know how to implement it yet. Here are some more notes while I think this through.

The revelation for me is the importance of functional programming to distributed computing. It wasn't long ago (lab 1) that I discovered limbo shell supported functional programming. Alphabet takes this to the next level by defining types. Alphabet-grid provides the connection between processing modules on distributed nodes. Altogether it provides a framework for distributed computing I'm still coming to grips with. It is a different way of thinking about computing than I am used to.


MapReduce Interview

Monday, October 18, 2004

lab 13 - flute


lab 13 - implement the flute instrument from STK.


I implemented more of the STK library but his time as a straight forward translation to a limbo module. Much of the protected classes and filters are in dsp.b as ADTs. They all share a similar interface that includes functions mk for building the object and tick for processing the next sample.

The instruments are generally larger to implement but follow the same interface. They can be plugged into a signal module and then read and controlled from within signalfs.

I've included a few simple modules that can be used to start a new instrument. I also tried to implement the more complicated Flute. It's close, but still doesn't sound right. It all needs a lot more debugging.

To test the flute,

% signalfs -a /mnt/dsp
% echo add flute.dis flute > /mnt/dsp/ctl
% sequencer /mnt/dsp/flute < > /dev/audio

Sequencer has some small improvements to handle more voices and open them as needed. The number of voices does not need to be limited but it gets bogged down with four.


Still more work to be done, but I'm almost at the point where I can start building a base set of hopefully interesting instruments.


The latest dsp.b dsp.m flute.b sequencer.b signal.m signalfs.b simple0.b simple1.b simple2.b

Wednesday, October 13, 2004

lab 13 - sound library

No code to post tonight because it's unfinished.

I'm converting all of STK to limbo, but not directly into signalfs. I'm creating a module that will contain all the sound sources, filters, and effects in the STK, with one ADT for each sound. This can then be used by signalfs to serve a file, which can be a combination of any of the ADTs, or by any other limbo app.

Rog has suggested an alternative application using the shell alphabet. I will try this once the library is written.

Rog pointed out how inefficient signalfs is in its current form. I agree; the performance is terrible, which makes it compleletly unusable for realtime sound support. This re-implementation will improve performance. But any hardcore DSP programmer is only likely to snicker at our attempt to implement DSP in limbo. At the end of the day I'm doing this to create a framework for ease of experimenting with DSP, not to create a sound system that will out perform all others. That is the tradeoff I make by writing this in limbo.

Another possible implementation is to create a C library module or device compiled into emu. This would perform well but be less malleable. I'd rather code in limbo.

Monday, October 11, 2004

lab 12 - oscilloscope


lab 12 - implement an oscilloscope for signals from signalfs.


I implemented an oscilloscope called scope to view the signals produced by signalfs, or other PCM data such as an iaf file stripped of it's header.

% scope < bach.raw > /dev/null

It writes it's input to it's output, but it doesn't sound very good if directed to /dev/audio. It writes small blocks of samples with small delays between writes, making it sound very choppy.

Scope tries to draw 25 frames a second, getting a tick from a timer, and reads 1/25th of a second of samples from the input, then draws it on a Tk panel.

This is might be useful when recording input from a microphone

% scope < /dev/audio > out.raw

It takes as parameter the sample rate and number of channels, stereo or mono.


Not being able to listen and see the waveform at the same time makes it less useful than I hoped. How do I keep in sync the visual and audio stream?

I'd like to present a similar graph using the FFT, but I keep getting errors from the fft module. It correctly does the transform but isn't able to permute the numbers into normal order (I think, I do not know enough about this.) Commenting out the "optional code" in /appl/math/fft.b seemed to make it work, at least make it not exit.



Thursday, October 07, 2004

lab 11 - microsoft

I spent the time this evening downloading the Microsoft C/C++ toolkit and the SDK. I have only built emu for linux and plan9 so far. So it's time to start building an XP version. I also expect to be modifying the audio driver to add some more advanced features, such as 8 channels.

While waiting for the downloads I found this site about audio networking interesting. A DTMF decoder would be a nice lab to do some night.

I tried compiling the inferno distribution for XP. I'm missing LIB.exe. I used link /lib instead. Got most of it compiled, including emu.

Wednesday, October 06, 2004

lab 10 - delay


lab 10 - delay line


I am continuing to add signal modules to signalfs copying the implementations from stk. Today I'm working on the delay line, and whatever else I can implement in two hours.

The delay line does not fit the model of signals I have created so far. From the STK it looks like it is used more as a utilitly class than a standalone filter. Its used by the echo class which actually does the mix of the current input with the delayed input. I could of course do the same thing and have delay as functions with the dsp module. Trying to use the delay, or echo, brings up a number of issues.

How am I going to stack multiple filters ontop one another and still be able to ctrl each one independently? To access to each ctl file I'd need to know the conversation number. This might be tricky to find out if I have multiple instruments each being built from many modules.

I want to alter the effect during playback independently of the instrument being played. But I'm not sure how to fit it in with a simple instrument. Where in the stack should it go? And how will I control it if it's placed under the instrument?

This goes back to the problem of needing some kind of patch bay. Given a particular instrument we need to now all the effects tied in to it. Then we want to write to the ctl file of any of them, not via the instrument but directly, and alter the effect. We need to remove the exclusive access condition on the ctl, although we could place it on data instead.

If I didn't do this I'd need a naming convention within the ctl file that was at the end of the filter pipeline. But that is ridiculous because what else am I using a fs for.

Therefore, If I put the, say, echo filter in front of the instrument, I still send note events to the instrument, but read sound back from the echo data file.

Is the sequencer going to be able to manage all this? The skini language may have to include naming of instruments using filenames. That is, events are sent directed to specified ctl files (filters, instruments) but audio data is read from only one, the one at the end of the pipeline (is pipeline the right term here? filter chain, patch? sink?).

I need to specify the sequencer language and a means for retrieving all the conversation directories for a pipeline before going further.


delay.b delayl.b

Tuesday, October 05, 2004

lab 9 - tmap draw image


lab 9 - create tmap image directly instead of using Tk


I modified tmap to use draw-image(2) because using tk(2) commands to create so many rectangles was slow and used a lot of main memory.

The changes were straight forward. The Tk cmd

TkCmd(t, sprint(".fc.c create rectangle %.1f %.1f %.1f %.1f "
 + " -fill #%s -outline black -width %.2f",
 r.x, r.y, r.x+r.w, r.y+r.h, dc[depth],  0.5));


rect := Rect((int r.x, int r.y), (int(r.x+r.w), int(r.y+r.h)));
img.draw(rect, t.display.color(dc[depth]), nil, (0,0));
img.border(rect, 1,, (0,0));

where img is the global Image destination. I then update the image on the canvas

tk->putimage(t, "tmap", img, nil);
tk->cmd(t, ".fc.c coords tmap 0 0; update");

tmap is the name of the image previously created with image(9) and associated with canvas(9).


It works much faster and uses much less main memory. It still uses a lot of heap memory. Tk is nice but has it's limits. It works well for creating and removing the highlight rectangles. But it isn't appropriate for creating the main image, because I don't want to keep track of every rectangle image, and there can be so many entries. The largest I tried was about 300,000 entries. I'd like to try creating a large image with 1 million entries.

I'm still improving this because I intend to use it as some kind of radar for monitoring file system activity.


tmap.b treemap.b treemap.m The screenshot is of strip view of the Inferno distribution.

music spree


Music band spree client and engine


I have been reading through spree code and wondering what engines I could write to learn more about this type of filesystem. Here is one idea that incorporates signalfs.

The aim is to create a distributed electronic music band. Each player is a client to spree and can create instrument objects and play notes on these instruments. The other players will have the instruments and sounds replicated locally by their spree client; and similarly instruments and notes they play are heard by the other musicians. Each spree client runs a separate signalfs but with a common set of signal modules.

The spree action commands will be SKINI with extra commands for creating and configuring the instruments. The spree engine manages the shared configuration of instruments. Note events are broadcast to all players. The spree client is therefore a type of sequencer, using timing and note events from the engine to trigger reading sound from the signalfs and writing to /dev/audio

Using OpenAL, or implementing it in signalfs, each instrument and player is placed in 3D space. The engine manages the positioning of objects, which now includes listeners, and each player hears the sound relative to their position in the virtual room. This requires a surround sound speaker system.

If this is implemented successfully it not be too difficult to generalize. Surely if we get the synchronization working for audio we could do the same for visual object simulations.

These ideas are influenced by my recent reading about Croquet. I'd like to see the implementation of the TeaTime protocol. The time synchronization seems the most challenging aspect of this idea.

Friday, October 01, 2004

corrections: lab 8

Here is a new tmap.b with the memory leak removed. I just needed to call Tk cmd delete all on the canvas to remove all the old rectangles.

And finally a screenshot of tmap using output from du -a / on the root of the Inferno distribution. You can see a file and it's path highlighted in red.

I've played about with the coloring a bit, but haven't found anything I really liked. The current scheme is for colors to represent depth. It goes from dark blue to light blue, through greens and yellows and ends on red for the deepest nodes.