Programmer Myopia

9/2/2005 7:46:04 PM

Programmer Myopia

Programmer myopia is what happens when a programmers become so fascinated with the interesting special properties and optimization opportunities of a data structure or algorithm that they have discovered that they fail to recognize its suboptimality in general.

Anyone studying MS Word’s DOC file formats will be familiar with the concept of piece tables, required for the successful opening and saving of such documents. There also seems to be some old university lecture notes on piece tables as a suitable data structure for storing text.

I agree with breaking up document text into separate pieces. One should probably not store text in one long array, because of the linear-time insert and delete operations; likewise, one should not store each character individually in separate object because of the sizeable overhead and poor locality.

However, I disagree with elevating pieces to a highly visible data structure so that they present the primary interface with which all text operations in a wordprocessor interacts with. This results in a low-level of abstraction and a cognitive impedance mismatch—unnecessary friction in the transformation from the concepts in a developer’s head to those in the final machine implementation because the two aren’t the same. Pieces should be an implementation detail hidden behind a virtualized list object that map closely to how people think.

A developer of popular set of Office conversion tools wrote about moving to a piece table implementation for representing documents to implement a Word-like object model, because that’s what Word appears to use based on an available file format spec. He assumed how Word saves documents out into a file is the same as its internal representation. That is not necessarily true.

Abiword also appears to make the same assumption. Abiword implements documents using piece tables stored in doubly-linked lists. In Improving the AbiWord’s Piece Table and other articles, an Abiword developer touts the exceptional virtues of piece tables:

One of the most critical parts of any word processor is the backend used to store its text. It should be fast to lookup, fast to insert and to erase new text at a random location, undo friendly, etc.

The AbiWord backend has all these virtues and some more. It's (IMHO) the most impressive piece of code of the whole AbiWord project, and one that has been exceptionally stable over the years. In short, Jeff's code rocksTM.

I believe that this developer was suffering from myopia. A few interesting properties and optimizations does not make the code more suitable, especially since undo code is not particularly difficult to write, and all of the other advantages mentioned are available with other document representations and at significantly less cost. Abiword’s piece tables have scalability problems, limiting its suitability to small documents. For instance, accessing parts of a document takes linear-time as does updating the document after each edit. I also bet that, because, of the low-level of abstraction, their code is more complicated than it needs to be, which is probably why a simple doubly-linked lists were used to join the piece tables in the first place. Why complicate a complicated data-structure further? (By “complicated”, I mean “doesn’t map well to human concepts.”)

Furthermore, future wordprocessors will likely perform more advanced transformations of the document. Abiword’s current implementation would make such transformations (a composition of a linear-time operation over another linear-time operation) a quadratic-time operation, which is an unacceptably slow and a limiting factor to its future success.

Another Abiword developer, Martin Sevior, asked in Rick Shaut’s blog, a developer in Macintosh Word, for some advice on improving the performance of Abiword.

OK here is is a technical question.

I am very impressed with the Scalability of MS Word. Your application generally degrades it's performance gracefully with document size.

At present the fundamental limit of how well AbiWord scales with document size is that our PieceTable (Document Model) is constructed as a linear, doubly linked-list and we fundamentally need to know the position of every fragment of the list.

So when an insertion or deletion happens in the document we need to update the position of every fragment in the document. This means of course that insertions or deletions will degrade linearly in speed with the document size.

Does MS Word also need to do this?
Of course, Rick wasn’t about to spill any secrets… but it does illustrates my point that Abiword developers fell blindly in love with an interesting, but ultimately limiting, data structure.
Programmer’s myopia is something that I watch out for, and, in a coming post, I will question whether I myself am being myopic.

Comments

 

Navigation

Categories

About

Net Undocumented is a blog about the internals of .NET including Xamarin implementations. Other topics include managed and web languages (C#, C++, Javascript), computer science theory, software engineering and software entrepreneurship.

Social Media