September 29, 2005

Managing the Meaning Permalink

I wholeheartedly agree with David Heinemeier Hansson's opinion, on the fact that DBMS should only keep application data, and not business logic:
I consider stored procedures and constraints vile and reckless destroyers of coherence. No, Mr. Database, you can not have my business logic. Your procedural ambitions will bear no fruit and you'll have to pry that logic from my dead, cold object-oriented hands.
In other words, I want a single layer of cleverness: My domain model. Object-orientation is all about encapsulating clever. Letting it sieve half ways through to the database is a terrible violation of those fine intentions. And I want no part of it.
That said, there is still a lot of impedance between object-oriented and relational models, and the current generation of O/R mapping tools only partially fills the gap. On the other side, as Chris Date maintains, even SQL itself fails at implementing the relational model itself in a satisfactory way.

However, as for managing the meaning, I agree with the knowledge representation/knowledge management perspective, that an ontological layer should take care of providing a domain model, and give the correct semantics to the data, thus enabling automated interpretation and reasoning, without (paraphrasing DHH) "letting it sieve half ways through the application".
This is a noticeably more difficult goal, since both tools and methods for ontological modelling are still far from being mainstream.

September 25, 2005

Semantically augmented File Systems, and ReiserFS Permalink

Although File Systems are the most widely used form of Content Management System, their mainstream implementations (on Windows and UNIX systems) have been until recently left untouched by various technological and functional innovations, happening in more specialized contexts. For literally tens of years, hierarchical folders have been the only widely accepted way to organize files on personal workstations.
BeOS introduced metadata attributes for files, backed by a relational db, but is now defunct. Google Desktop provides an effective way to search personal files with the same user experience of web search: not a new idea, but I have always found previous implementations (Windows' search feature, available form 1995) rather unusable.
The next generation, Apple Spotlight and WinFS will make heavy use of context-dependent metadata for files, based on a plugin architecture.

But I think that the greatest innovator in this arena is Hans Reiser. Ten year old Name Systems Ventures has not only just released ReiserFS4, but keeps asking the right questions (and providing opinionated answers) back from 2001, in their Future Vision whitepaper, which challenges estabilished practices and suggests new approaches and new ways. From a recent interview with Hans:
[...] metafiles are files that are about other files. pseudo files are files that are implemented not by storing and retrieving the data in a regular file but by the plugin calculating what it should construct for read, or performing some operation other than just writing the data somewhere in response to a write. For example, someday cat /home/reiser/mp3s/..../childcat > /dev/dsp will concatenate every file that is a child of my mp3s directory and send it to the speakers.

Someday longer away, you'll be able to use queries in the FS, and send all the blues mp3s that your dad emailed you, or all the mp3s related to "britney spears" and "spoof" to your speakers. Using cat, or other dumb programs absent of querying intelligence. There will be a very very sophisticated naming system, and all the programs in the OS will not need any complexity of their own to tap into the power of sophisticated naming.
Another thing that did not make sense was that in V3, performance for files randomly generated with a uniform distribution in the 0-10k size range was worse if tail packing was turned on. It "should" have been better. In V4 it IS better, for the reasons described at at quite some length in the part about why BLOBs are a bad design idea. This actually has implications that go far beyond Reiser4 as BLOBs are the dominant paradigm in the database world...

September 18, 2005

KM and (Software) Framework Design Permalink

The Pit of Success: in stark contrast to a summit, a peak, or a journey across a desert to find victory through many trials and surprises, we want our customers to simply fall into winning practices by using our platform and frameworks. To the extent that we make it easy to get into trouble, we fail. (Rico Mariani)
I spent a lot of time in the last few years designing and developing software frameworks (mainly for text mining, text classification and statistical natural language processing). I just read a presentation from PDC 2005 (the main annual Microsoft event for developers), "The Art of Building a Reusable Class Library" by Brad Abrams and Krzysztof Cwalina (it is a set of PowerPoint slides, but you can view it with OpenOffice if you are not a MS Office user), which details a set of principles and best practices for developing reusable frameworks, that I found very true from the standpoint of my personal experience.

The framework designer holds a huge responsibility, since if a part of the framework (such an abstraction, or even a simple name) is not naturally understandable, the users, no matter how experienced, will keep failing at using it correctly. Following this premise, the authors provide a set of recommendations, which seems more related to knowledge management (and "cognitive") solutions, rather that practices strinctly related to software developement:
  • exploit sameness and consistency
  • prefer scenario-driven design
  • communicate via leaving artifacts, rather than through documentation
  • follow a common vocabulary

September 14, 2005

Generators Permalink

I just discovered the Generator Blog, and, after that, the Uzeful list of online generators.

Generators are small applications (usually web-apps) that automatically produce semi-random artifacts of various kinds, using more or less sophisticated heuristics and/or pattern combinators. Well known examples of generators are the random scientific paper generator, and Chris Coyne's tool for drawing pictures using context-free grammars [see my previous post on GFDG].

What impresses me, however, is the effectiveness of very simple generators. As an example, GFDG has a generator of random underground maps (inspired to the graphical style of the London Tube map), and the resulting images look very plausible; the company logo generator produces very simple logos composed only by two or three ribbons and/or points, which however looks very meaningful, and evokes human-like shapes and somewhat inspiring symmetrical symbols. I guess our very biased cognition system plays a fundamental role (and does a big part of the job) in the process of interpretation of these pictures. As usual, I'm interested in using these tools as an aid for (graphical) design practices.

Not a new idea, but still amusing: the random poem generator.

September 12, 2005

The Mozilla Platform Permalink

A month ago, Kottke was wondering whether Mozilla Foundation has the vision to make Firefox the most important piece of software of this decade (as an open application platform). I offered my opinion on Paolo's blog: XUL+Javascript is starting to get more and more interesting; however, Javascript is not (yet) well suited for writing large applications. I would love to see a Python or a Ruby engine integrated within Firefox or XUL Runner, that would really be a viable foundation for open, cross-platform network-based applications. Being able to exploit the client (as opposite to heavily rely on the server side) while still being network-based is still an unreolved issue for me.

Today, Mozillazine mentions Brendan Eich's work. Brendan is developing the infrastructure which will allow Python to be used for XUL scripting:
Support for Python in XUL will land in the Mozilla 1.9 timeframe and is expected to be used primarily by developers of extensions and standalone XULRunner applications.

Work is currently ongoing to allow languages other than JavaScript to be used for DOM scripting, which is a necessary step to enable Python support to be implemented. In theory, this will also allow support for other scripting languages to be added to the Mozilla framework. However, there are no plans to support any languages other than JavaScript in webpages.

Although future releases of Mozilla applications such as Mozilla Firefox and Mozilla Thunderbird will ship with support for Python XUL scripting, a Python interpreter will not be included. This is expected to mostly be an issue on Windows (Mac OS X and most Linux distributions already include a Python environment) but Brendan expects this problem to be solved soon.
The Mozilla Foundation has an (old) document on this topic: Roadmap for Language-Agnostic Scripting Support.