Friday, July 01, 2005

lab 34 - lexis: semantic binary model implementation


lab 34 - lexis: semantic binary model implementation


The code linked to below is a database implementation based on the paper, A File Structure for Semantic Databases by N. Rishe. The paper is brief and required reading for the rest of this lab.

The application is in three modules: the Btree, the SBM abstraction called lexis, and the query module.

The Btree implementation stores key-value pairs. This Btree is a large improvement over my previous attempt, tickfs, in that a prefix length is included for each entry which improves storage efficiency. Binary search is used when searching within a block making it faster , especially considering the increase in the number of entries per block. The caching and locking mechanism have also changed, and are described in earlier labs.

Lexis is the SBM abstraction layer that sits on top of the Btree. It does not use the value field of the Btree as all information is stored in the key. The abstractions exported are Facts, which include Categories, Relations, and Attributes. The last is a small difference to the Rishe model. Attributes are relations of abstract objects to concrete objects (array of byte, which is uninterpreted by lexis). Relations are between abstract objects only. Each fact is stored in normal and inverted form as described in the paper.

Finally, there is the query module. All the query abstractions from the paper are implemented. However, the approach used for combining operators to build a complex query expression is based on data streams and communicating processes. The idea is described in D. McIlroy's, Squinting the Power Series. See also lab 31. Here's an example of inserting data to lexis and querying it.

% cd lab/34
% mkdir /dis/folkfs
% mk install
% mkdir /lib/lexis
% > /lib/lexis/       # the lockfile
% >
% folkfs/tok -i *.[bm]      # index all the source files

Now query for the symbol schema that is on the right side of the relation Hit,

% folkfs/get -q 'Hit schema ?Ra'

The query string is tokenized. Each token is either an operator (a?, aRy, aR?, ?Ra, a??, aC), an abstract object, or a concrete object. Each operator has the function prototype:

op(U: chan of list of ref Dat): chan of list of ref Dat;

The operator will create a new channel and spawn another process that receives from U and sends a result down the created channel. So in functional form it looks like:

?Ra(push('fact', push('Hit', startchan)))

In effect, for each token we get a spawned process that moves in lock step with the other processes receiving from an input channel and sending to an output channel.

An operator such as ?Ra will pop the Relation and abstract object symbols from the stack, search the Btree for all matching Facts and push the abstract or concrete object back onto the stack. This happens for each input from the channel so the channel is a stream of stacks.

This is a starting point for further experimentation with the data streams style of programming. Also, lexis will now serve as a database backend for some future file systems for web services.




demerzal said...

What is the fundamental difference between a semantic data model and a relational data model?

caerwyn said...

For me, the key difference between the semantic binary model and relational model has always been about the binary, and in that the difference is obvious.

Up till now I've referred to the Rishe book on SBM. You might find meaning there (get it :).

caerwyn said...

From the book
A data model is a convention for the specification of the logical structure of real-world information. The cornerstone of the contemporary theory and technology of databases was the development of the Relational Data Model. The recent development of the new generation of data models - the semantic models - offers a simple, natural, implementation-independent, flexible, and nonredundant specification of information. The word semantic means that this convention closely captures the meaning of user's information and provides a concise, high-level description of that information.