11/14/2007 3:29:03 AM


I noticed work by SuperCompilers, LLC on supercompilation that takes on the myth of the sufficiently smart compiler (which I previously wrote in my post on Humans vs Computers). Their technology seems very similar to my work with its reliance on partial evaluation.

The company deals with the same sort of criticisms about NP-completeness and undecidability in their white paper "Supercompiling Java Programs."

The problem of fully optimizing an arbitrary computer program, when mathematically formalized, turns out be an incomputable problem, we do not need to optimize arbitrary mathematically given computer programs, we only need to optimize programs written by humans. And humanly-written programs tend to have a lot of special structure too them, including both obvious and subtle redundancies that lead to inefficiencies.(this is related to algorithmic information theory; see, Chaitin, 1987 and Bennett, 1986). There is no possibility of ever creating an algorithm that can fully optimize a general computer program. Fortunately, though, the practical problem of program optimization -- though still very difficult -- is a lot easier than the general mathematical problem to which it corresponds. Because, in reality

Supercompilation does not try to solve the impossibly hard problem of fully optimizing an arbitrary computer program. Rather, it is an extremely valuable heuristic technique, which seeks to find ways of significantly optimizing a wide variety of computer programs. Most real-world computer programs are chock full of inefficiencies, which can be removed via the supercompilation process. These inefficiencies are often hard to see in the source code, but they are highly transparent when the code is translated into a more mathematical form, as is done inside the supercompiler. Some of these inefficiencies are introduced by sloppy programming; others are placed there intentionally, by programmers interested in goals besides efficiency, such as maintainability and comprehensibility of code, or speed of software development. In many cases the most efficient way to implement a given piece of code is a truly humanly infeasible way, involving literally hundreds of times more lines of code than the ways that occur naturally to humans.

Another post "Illogical Arguments in the Name of Alan Turing" in O'Reilly's website criticizes bad arguments by people, that static code analyzers are not useful at all because they are limited by the halting problem, or that logic flags can never be found because of the halting problem.

Humans are subject to the same theoretical constraints as computers, but yet they do not perform the same laborious, brute-force search. Instead, humans employ a strategy of recursive pattern matching and replacement when thinking about and checking their programs. The kinds of programs that people write are those that they can easily reason about and review later; being maintainable, these programs belong to small subset of possible programs and are inherently tractable for analysis.

One paper "Solving difficult SAT instances in the presence of symmetry" suggests that the structure of a problem can affect its performance. That, in practice, a technique such as SAT-solving which is known to be NP-complete, often doesn't break down as it theoretically should. The authors point to the presence of large number of symmetries in the problem as somewhat of a requirement for explosive behavior.

Research in algorithms for Boolean satisfiability and their implementations [23, 6] has recently outpaced benchmarking efforts. Most of the classic DIMACS benchmarks [10] can be solved in seconds on commodity PCs. More recent benchmarks take longer to solve because of their large size, but are still solved in minutes [25]. Yet, small and difficult SAT instances must exist because Boolean satisfiability is NP-complete.

A 1994 Infoworld article quoted a researcher saying that deep static analysis requires years of analysis time. Upon reading that quote, I immediately thought that the researcher's approach might be very inefficient, because it implied that, after some depth, a human could outperform a computer on the same job. Perhaps, if we mimicked actual human reasoning, we could produce more effective tools.






Net Undocumented is a blog about the internals of .NET including Xamarin implementations. Other topics include managed and web languages (C#, C++, Javascript), computer science theory, software engineering and software entrepreneurship.

Social Media