Saturday, June 18, 2005

lab 32 - lockfs


lab 32 - lockfs


I've struggled for a while trying to get multi-reader-single-writer locks (rwlock) working the way I wanted for a database I've been working on (more in a later lab). I recently realized the answer was in inferno all along.

The problem was not in implementing a rwlock, which i had done for tickfs using channels and it worked fine, the problem came when I needed to kill a process that might be holding a lock.

A lock based on using channels usually means that a spawned process is waiting for an event on a channel either to obtain or free the lock. However, if the process that holds a lock is killed it doesn't get to send the unlock event to the lock manager process.

You might think, just don't kill the processes that might hold locks. But killing processes is sometimes hard to avoid. For example, when a Flush message arrives, or to cancel a query that launched multiple processes. Programs that depend on locking based merely on channels are quite fragile, for example, tickfs.

While implementing the wikifs (lab 30) I noted the lovely property of file locks. When the process died, the file descriptor was garbage collected and the file Clunked, the lock automatically freed. The file lock in wikifs was a simple exclusive lock, using the file system properties DMEXCL to manage the exclusion (using kfs in this case).

What I wanted was a file system that served a file representing the rwlock. It already existed in lockfs. I think what had confused me originally about this fs is that I believed it had to be layered on top of the actual data file I was going to work with. That wouldn't have worked for me so I didn't experiment with lockfs. What I realized this week was that I could have lockfs control access to an empty file, and then I'd get all the benefits of file locks, and more. The rlock function became simply

lock := Sys->open(LOCKFILE, Sys->OREAD);

and wlock became

lock := Sys->open(LOCKFILE, Sys->ORDWR);

and freeing a lock was nothing more than,

lock = nil;

which would happen anyway when the function or process exited. Brilliant!

The extra benefit is that lockfs can serve it's namespace and act as a lock for a cluster of machines.

Switching to using lockfs has considerably improved my database application. The example of tickfs shows that Styx's Flush message isn't handled properly because of the problems of killing processes that might hold locks. Lockfs makes the application more robust. Also, more than one application can now access the datafile as long as each respects the locks served by lockfs. These applications do not even need to be on the same machine. By combining using lockfs with the kind of lockless cache I discuss in lab 30 I can completely avoid locking based on channels that caused me headaches when I needed to kill groups of processes that held locks.

No comments: