Intelligence vs Intellisense

1/8/2006 5:14:39 AM

Intelligence vs Intellisense

My main gripe with Microsoft is that the company doesn’t know how to write “smart” software. (Much of the industry doesn’t either, but, since Microsoft is the leader…)

The current tendency among software developers is to determine the minimum work needed that can be done to help users. This is a sensible strategy driven by short-term business concerns. A more ambitious developer would ask what more could be done—what can humans still do (or do better) that computer programs yet can’t.

Microsoft invented “Intellisense,” in which an application attempts to behave intelligently while observing a user’s actions. Intellisense uses a set of heuristics that more or less work without a real understanding of the document. However, heuristics offer no guarantees to correctness, so Intellisense often works unpredictably or produces errors, which sometimes forces the user to waste time undoing the error. These errors are in all of Microsoft’s products, but they are more common in Office than in Visual Studio, because Microsoft Office doesn’t recognize structure within in documents while Visual Studio actually parses code in the background and therefore has richer knowledge of the user’s documents.

In contrast to Intellisense, “Intelligence” requires a genuine understanding of the document. This often consumes more time and memory. However, the benefits are significant: Since Intelligence performs deeper analysis and guarantees correctness, it can be used reliably to perform major transformations of the document like refactoring, whereas Intellisense, in contrast, is typically limited to auto-correction, limited auto-formatting, and auto-completion. To offset the additional processing overhead, Intelligence offloads “higher-level” work from the user and takes advantage of computer’s ability to outperform users in tedious, repetitive or brute-force activities.

General Principles versus Specific Rules

Intellisense typically requires hundreds of adhoc rules to find errors, mostly of a trivial nature. With Intelligence, a general purpose algorithm that has some genuine understanding of the document could perform the function of hundreds of rules. Whereas Intellisense utilizes specific rules, Intelligence focuses on general principles.

In my tool, NStatic, I strive for Intelligence over Intellisense. I continually ask myself how far I can take this, attempting to close the gap between errors that only human being can find and those that a computer can find. These are some principles that I code to:

  • Exceptions. Any code that inevitably causes an exception is an error, unless it is caught and handled by a proximate catch block.
  • Comparison. Comparisons and conditions should never evaluate to always true or always false, unless a named constant or a literal is present.
  • Infinite Loops. All code should be able to terminate. (Yes, I know that the Halting Problem is undecidable, but it’s still attemptable plus that reasoning never stopped the CLR from verifying code.)
  • Redundancy. Any operation which is redundant adds no value.
    • Non-operation. A complex expression should not always produce a constant result, except in a few situations. An assignment should change the value of the assignee. An expression should be used.
    • Dead code. All code should be executable.
    • Dead store. A variable assignment or initialization should be used before it is reassigned or goes out of scope.

Most analysis tools contain rules that perform some matching at the syntax level. Instead of constructing rules to catch specific instantations where the principles are violated, my tool simulates code execution symbolically and utilizes a constraint solver to find errors in a more general way.

A typical code analysis tool using heuristics may catch the following redundant assignment by checking if the left hand expression matches the right expression.

x = x

This heuristic would produce a false positive if the assignment was x[i++] = x[i++].

An intelligent tool would determine if the right hand side of an assignment evaluates to the value of left hand side before the assignment, such as in the following case.

x = a;
y = x + 1;
a = y – 1;

A heuristic tool might look for the specific case of the same variable being tested for different values within a conjunction as in the following case:

a == 4 && a == 5

but miss other cases that follow from general principles such as the following

x*x + a * a == 2 a*x && x == a

Approach

In trying to bring intelligence to software,

  1. I identify the actual rules and steps that people use to solve a problem in human terms. This often involves me researching the actual steps that humans use in a human setting.
  2. I accurately model the human concepts involved in code to avoid any difference between the computer representation and reality. (A couple good examples in my code include word senses and symbolic expressions.) This often trades off performance for full fidelity.
  3. The final code is usually a straightforward implementation of the rules and steps from point 1 into code such that the code often reads like an instruction manual for humans. Sometimes, I need to perform humanistic techniques like searches, normalizations, pattern matching and permutations.

This is how I approached, for example, the problem of determining transitional relationships between sentences in my natural language product. I consulted textbooks on English composition and identified the four general ways that sentences are linked from most explicit to least.

  1. Transitional keywords and phrases, which fall into several categories contrast and qualification (however), continuity (in addition), cause/effect (therefore), explanation (indeed), exemplification (for instance), summation (finally)
  2. Pronoun references & determiners
  3. Repetition of keywords (or their synonyms)
  4. Repetition of sentence patterns

Typically, natural language software focuses only on the first point, which is the use of transitional expressions. All the other points require additional work such as tracking pronoun references, parsing and analyzing sentence structure, or utilizing an ontology. Some researchers remarked that it was not possible to accurately determine transitions between clauses and sentences. I disagreed from my belief that if a human can deduce a relationship, a computer should also be able to. These researchers also developed their software under limited available knowledge (ie, without a dictionary backend).

There are still possible avenues of improvement in my upcoming static analysis tool. One area that humans best computers is in recognizing natural language within code. Since I have a library of natural language routines, I have asked myself if I could incorporate natural language understanding to locate another class of bugs that have previous eluded analysis tools.

Comments

 

Navigation

Categories

About

Net Undocumented is a blog about the internals of .NET including Xamarin implementations. Other topics include managed and web languages (C#, C++, Javascript), computer science theory, software engineering and software entrepreneurship.

Social Media