LINQ

9/14/2005 4:22:14 PM

LINQ

Robert Scoble warned about “shock and awe” from the new PDC announcements. I must admit that I was shocked.

I have been looking at the C# and LINQ docs available at MSDN. Language Integrated Query (LINQ) for the most part appears to have been designed well. Integration across multiple data sources is based on the patterns (Select(), Where(), OrderBy(), GroupBy(), etc) rather than explicit interfaces.

The use of patterns allow choices of implementation:

  • Extension methods versus instance methods. Object-based collections obtain querying capability by importing extension methods from a namespace, while other data sources provide their own methods.
  • Lambda expressions versus expression trees. Queries can be performed locally through lambda expressions or through an another engine (SQL or XQuery) by passing the abstract syntax tree of the expression, possibly preconverting the expression to string in the destination query language.

Using patterns makes it simple for developers to integrate querying capabilities for their own objects and data sources. For instance, an object can optimize the behavior of any Where or OrderBy clause by defining its own instance method on the object; an object can also build its own querying engine that operates on expression tree rather than an arbitrary lambda expression. Patterns provide great compile-time integration but expose issues at runtime; reflection and late-bound calls break down for extension methods and expression trees.

LINQ provides a standard functional convention for writing queries,  but languages are free to provide special syntax.

collection.Where( x=> x.Age>20 ).OrderBy(x=> x.Age ).Select(x => x.Age)

C#’s own special SQL-like query expression syntax is simply a strict macro transformation to the functional form without performing any other processing. Interestingly, these sort of macro transformations, which Lisp and other languages have had for decades, are the future, but the language designers haven’t figured that out yet.

For instance, I have been noticing an increasing reliance on symbolism through names, macros, and patterns as mainstream languages move to a higher level of abstraction. In C++, any method name is usable in templates prior to instantiation. In C# 3.0, field names of anonymous class are lifted off of the names used within expression. VB 9.0 uses names in dynamic interfaces to achieve duck typing.

Compatibility with .NET v2.0

The language extensions do not require a new runtime as I originally thought, so applications using LINQ should run directly on Microsoft Vista. The new Visual Basic and C# compilers currently work with Visual Studio Beta 2. However, this compatibility may also have limited the extent of the innovations.

It doesn’t seem that the Whidbey runtime has undisclosed support for new features, which makes sense because of the potential testing impact. Anonymous types are indeed anonymous compiler-generated types and not new class of types, tuples, available in the runtime. Other features, like extension methods and dynamic interfaces, are incorporated through standard extensibility mechanisms like attributes.

Comparison with COmega

Having examined COmega and friends (Spec#, Polyphonic C, Xen), I was somewhat disappointed by the extent of the change. I was hoping for more extensive query capabilities for objects. I was looking more for XQuery/XPath rather than SQL. In one way, the query expressions do resemble the XQuery’s “FLOWR” notation instead of SQL.

I guess that I wanted query over object trees rather than object collections, which was enabled in COmega through sequences and generalized member access. I have repeatedly seen people wrap XPathNavigators over their objects, as well as other data stores such as the file system and registry, so that one could perform XPath queries over them either for convenience or consistency. However, I soon realized the querying APIs provide could provide a slightly less compact version of generalized member access features in COmega.

  • Filters.
    • Instead of sequence[it > 1], one writes
          sequence.Where(it => it>1)
    • Instead of sequence[i], one write
          sequence.ElementAt(i)
  • Type Lifting.
    • Instead of sequence.Property, one writes
         sequence.Select(it => it.Property)
    • Instead of sequence.Collection, one writes
         sequence.SelectMany(it => it.Collection)

By adding properties such as Parent and Descendants to my objects and using them in conjunction with the Select and other methods, one can create queries over trees. C# iterators, which most of these query methods are based on, are quite powerful alternatives to XPathNavigator.

Good Database Integration

The database team (DLinq) did a good job at the integration by providing an object-relational mapping mechanism that integrates well with the query expression syntax. I initially wondered how updates were performed as I didn’t see any keywords for insert, update and delete from COmega, but it turns out that updates are as simple as setting an object property, inserting an object into a collection.

One can easily perform joins between objects and relational data. One point of integration that I am not clear on is how DLinq handles expressions that contain functions not supported by SQL; I believe that the operation probably fails and that ToSequence() needs to be called to invoke local processing.

There was some concern in blog posts that attributes-based mechanism would not easily extend to other database providers. However, the pattern-based approach of LINQ makes it easy for non-Microsoft providers to participate.

Unfortunately, transaction support is still suboptimal.

Weak Xml Integration

Xml integration is overly verbose and weak. I wasn’t expecting the C# team to incorporate XML literals, but, at the very least, it should have been possible to create XML expressions succinctly through the new object initializer support.

In addition, the XLinq team missed by not offering a typed mechanism of creating XML trees, so even the deep XML integration in VB occurs through a dynamically typed mechanism, which can only be checked at runtime.

From a validation perspective, XML literals are only better than string-based XML in that the raw angle-bracket XML syntax is checked. This is also a problem with the C# usage of XElements. Any typos in the element and attribute names within XML literals simply won’t be found unless some additional schema validation is also performed, but, since XML literals are XML fragments, that’s unlike to occur. Any typos in the names of VB XPath-like accessors that refer to XML elements and attributes aren’t going to be caught either; while I see some potential integration between VB’s duck typing and XML literals to alleviate this problem, currently, declaring a variable as a dynamic interface ends up hiding the underlying XML type, turning off the special XML handling in VB.

Visual Basic

Visual Basic seems to have introduced more extensive changes to the language than C#. Many of the COmega inspired features seems to have migrated into Visual Basic instead of C#; one wonders whether this was due in no small part to Erik Meijer’s close relationship to the VB team.

Both Visual Basic and C# provide support for implicitly typed local variables, query comprehensions/expressions, expression trees, object initializers, anonymous types, lambda expressions. Visual Basic now offers enhanced nullable type support,  and relaxed delegates, previously available in C#.

Visual Basic offers deep XML support with XML literals and XPath-like accessors. As I mentioned above, that XLinq’s XML types are completely untyped, so this feature is highly convenient syntactical sugar.

The VB team is betting that the pervasiveness of XML entitles it to first class language support, that is even better than some of the other primitive data types. Most of the need for XML literals, I think, is satisfied by object intializers; for pure in-memory operations, objects initializers are typed and more efficient. On the other hand, XML literals are easier to query and simpler to construct for writing out to files or over the network, and lead to greater programmer productivity.

Visual Basic enhances syntax support for reflection through duck-typing (with dynamic interfaces) and dynamic identifiers. Duck typing enables intellisense and automatic casting for late-bound calls and guards against the inadvertent typo. Dynamic identifiers allows one to call o.(propertyName) in place of o.GetType().GetProperty(propertyName).GetValue(null).

Comments

 

Navigation

Categories

About

Net Undocumented is a blog about the internals of .NET including Xamarin implementations. Other topics include managed and web languages (C#, C++, Javascript), computer science theory, software engineering and software entrepreneurship.

Social Media