9/7/2006 4:12:30 PM


Last April, I wrote about a profound design change I was making into my document-based applications, which was to make all my document data structures immutable--standard practice in some functional programming languages.

My original motivation was to make each of my model objects an expression that I could work with in a functional way and apply transformation rules to. (My first attempt a few years ago failed because I earlier assumed that mutability was required for editable documents.) The result is substantially less code and tight integration between my document and my AI framework. From the user's point of view, the application would appear more intelligent, permitting more complex document analysis and manipulations than seen in the past.

The document essentially becomes a value. The ease of changing a document in this fashion is comparable to using a TextBox control, which is simply accessed and modified by getting and setting the Text property of the TextBox control.

There is an additional cost of path copying from the root of the document tree to the changed node, but this is more than offset by a number of other direct benefits such as trivial multithreading, elimination of undo work, and other simplified algorithms (eg, easier document comparison/merging). Maintaining an undo log is actually expensive, when the document changes are high during an operation. My algorithms tend to be simpler because of the immutability variant and the access to the before and after state of the operation. My data structures became simpler, because they only need to be descriptive; anything extra is typically caching computed information. I no longer needed to store parent or other information, which enables extensive sharing.

There are some complications. I have lost object identity and each node has to be the document tree now needs to be referred to by its path, but the path can be encapsulated inside its own object and it turns out to be rarely need. Object identity presents problems when persisting, deleting, copying or pasting data (especially multiple times). If identity is really important, the node should contain an ID.

I also don't store an observers collection; I don't need to. After an entire operation is completed, I present the root view with the document before and after an operation. The view proceeds recursively downward to update its own nested views.

Intuitively, this approach makes sense. The model should be pure data, just as a string is in a TextBox ; it shouldn't containing any references to any view. The view should be able to rebuild itself based on the new and old states of the data. If a recently document node contains many children, there is a logarithmic time approach to compare before and after state of document node and find the range of children that were changed that I mentioned in a previous post.

I recently also came to the conclusion that my view data structures should also be immutable. In the course of researching the impact of changing the view, I noticed that a number of special purpose classes would no longer be needed resulting in a substantial decrease in code. I also discovered that I could dispense with a parent pointer and a reference to the model object in the view; these two could be externalized and provided by the parent. This also enables flyweights as well as virtual views, where a rendered model node is not actually backed by its own view node.

The changes are likely to improve performance because the modest costs of path copying are offset by the elimination of additional other processing (and accompanying allocations) and the impact of heavy sharing. The other processing includes layout, notifications between the model and view, and initialization/disposal code. Perhaps immutable objects could make Avalon less complex; however, Avalon might need to forgo the naturalness of property assignment.

This reminds me a blogger post that design-patterns aren't needed in functional programming, because they exist to make up for deficiencies in procedural languages.






Net Undocumented is a blog about the internals of .NET including Xamarin implementations. Other topics include managed and web languages (C#, C++, Javascript), computer science theory, software engineering and software entrepreneurship.

Social Media