Microsoft's Mouthpiece

6/5/2005 1:32:14 AM

Microsoft's Mouthpiece

Some reader’s, who are skeptical about the Microsoft’s Office XML file format claim, believe that I am not critical enough of Microsoft’s announcement. However, I did read through the white papers and I did examine the license agreement for using those formats.

Some of the criticisms leveled at Microsoft are that the file format is…

  • … not really open, because it is not standards-based.
  • … actually based on ZIP compression rather than XML.
  • … nothing new. Microsoft had XML support in 2003, and the XML in 2006 doesn’t offer significant advantages.
  • … not complete across all Microsoft Office applications.
  • … not important for most users.

My concern is that, as an ISV, that I can read and write Microsoft Office document formats, which are what most people use. That Microsoft is replacing inaccessible binary formats with far more accessible ones, that work with common tools, will become the default file formats, and are available royalty-free to use is good news for me. It’s hard to spin that negatively. Ok, so Microsoft did not go far enough, the detractor says, but they satisfied my all primary concerns.

The issue about openness is semantics for most vendors. The important point is that the specification is published and available for use—the definition that Microsoft uses; whether it is standards-based is not particularly meaningful for me, though it may be for some. The more recent versions of the file formats are not public and require that obtaining access from Microsoft’s legal department. I suspect that Microsoft uses that as a lever to exchange access to competing formats as well for its own conversion tools at least in the days of WordPerfect and Lotus.

Even then, the present file formats are too difficult to work with and test; as a result, most of the corrupted files that Microsoft’s encounters are the result of third party attempting to write out Microsoft formats. Microsoft did provide prior roundtrippable XML and HTML formats, but those weren’t the default formats and weren’t exported by earlier versions of Office. 

Because the new file formats are packaged in ZIP, they are not readily accessible by XML tools. ZIPs APIs are available in .NET v2.0 and Java, and libraries abound in other platforms. Although there is an additional step involved, it doesn’t really detracts much from the increase in ease of reading and writing Office formats. I say this because there are significant tradeoffs incurred as well using a pure XML-based with embedded objects in base64. Anyway, the underlying data is still XML, which has substantial API and tools support.

It may be that users might not care about the new formats; it could be even annoying to have transition to a new format. On the other hand, the new format is actually more accessible to users, who can crack open the file format by changing it to zip and pulling out and replacing an image directly without launching the actual application. Secondly, the new formats will encourage wider support from other third-party applications, simply because they are easier to use and are documented. Lastly, Microsoft claims additional benefits in terms of macro security, privacy, reliability, compression, and third-party extensibility—we’ll see.







Net Undocumented is a blog about the internals of .NET including Xamarin implementations. Other topics include managed and web languages (C#, C++, Javascript), computer science theory, software engineering and software entrepreneurship.

Social Media