Larry O'Brien

3/1/2007 7:08:25 AM

Larry O'Brien

I have not been blogging frequently because I am attempting to deliver a product, but I decide to respond to one of Larry O’Brien’s two posts critiquing my own blog posts. In the other post, he puts my name on the title and links me to a strawman argument that I didn’t actually make: “Wesner Moise Claims IDEs Disprove ‘No Silver Bullets’: I Say ‘Are You Kidding?’”

In Bad Comparison: 14 Line Python RegEx evaluator versus Microsoft’s 14K lines, Larry writes.

Wesner Moise points to "generalized regular expression matching" as a moderately hard problem that might serve as the basis for comparing programming languages and approaches. He says "Microsoft's implementation of regular expression matching over strings is spread across 24 files and 14,455 lines of code including comments and whitespace." (I'm not sure how he'd know that -- I assume he's talking about Rotor source code.)

He wonders if a functional approach could be more efficient and points to a 14-line Python program.

No, no, no: they are two incredibly different capabilities.

The Python program implements something like the original definition of a regular expression --restricted to that which can be expressed in a single line of Extended Backus-Naur Form without recursion. "Regular expression support" for today's languages means something very, very different, starting with compatibility with Perl5 and going from there. Backslashes, named groups, etc. are complex features that require, in any language, something in excess of 14 lines.

Larry seems to miss the fact that the Python implementation could easily incorporate all of the features he described easily. The implementation is based on functions not finite state machines, so we aren’t limited by the expressiveness of regular languages. The Microsoft implementation does include compilation and parsing, so it’s not directly comparable, but if one wanted to create a reasonably performing regular expression engine for list structures besides strings with the same sets of capabilities, it could involve dramatically less code implemented in a functional style. I’ll eventually post source code for such an implementation, but right now I can’t justify the time when I still need to deliver a product and generate income.

Implementing a regular expression engine is not really a hard problem. My first implementation was difficult when I tried in college to extend the DFAs and NDFAs to accommodate features introduced in Perl, while at the same retaining the minimal processing that FAs offer. When I look at Spencer’s open source UNIX implementation of regular expressions, I noticed that he eschewed all the academic models for what I considered at the time “a quick and dirty” implementation, which while must faster in practice had theoretically inferior performance. 

Larry continues about functional programming and silver bullets:

Having said that, regular expressions are a good fit for functional approaches. But, just to point out the lack of silver bullets, attempting to parse a left-recursive grammar (a grammar with a production of the form A->AB) will hang a simplistic recursive-descent parser.  

Larry, whose job is to be an expert on programming languages, seems to lack any imagination. Larry takes a false premise about recursive-descent parsing and makes a false inference about the lack of silver bullets. He’s making a statement about a programming style that, being Turing-equivalent, is maximally expressive. It is possible to implement a highly readable, simple, table-free, recursive-descent parser for left-recursive grammars using continuation-passing style that performs as well as traditional table-driven approaches. I myself use continuation-passing style for implementing backtracking algorithms. One good paper on this is Derivation of a Typed Functional LR Parser. There are other approaches I have seen such as using data representations of functions in parser combinators.

Larry’s heavily wedded to the idea that there is no silver bullet. It has a Godelian ring that makes him look wise. I don’t buy into Brooks’ argument that there is no silver bullet to put down the software beast. It rests on assumptions such as that existing program languages have become high-level enough. He wrote this back in 1986, when the dominant languages of the time, assembly, C, Basic, Fortran pre-90, were quite primitive compared to now. Program languages can still become far more conceptual and declarative. I personally see the incorporation of functional techniques and symbolic manipulation as necessary steps towards that goals. My own belief in silver bullets, probably reflects my own push for fundamental advances in software that are happening too slowly in the big companies.

Lastly, Larry O’Brien claims in one post that he is nice during the day, but becomes a werewolf when he writes his blog post.

I think I'm quite nice. You might not know that if you just know me from my blog and articles. I try to always be fair when I write, but that's not the same as being nice, which I think I am in person.

Hopefully, someone like Charles Simonyi comes along and shoots a silver bullet through the beast, whose collapsing hairy body morphs back to the naked, pale-skinned body of a Hawaiian geek.






Net Undocumented is a blog about the internals of .NET including Xamarin implementations. Other topics include managed and web languages (C#, C++, Javascript), computer science theory, software engineering and software entrepreneurship.

Social Media