Office XML, II

6/2/2005 11:06:31 AM

Office XML, II

I read through some of the Office XML documents, and the design, I must say, is utterly brilliant. They made it dramatically easy to read and write files including binary data such as images. They solved every major issue like file size, privacy, macro viruses, corruption, data recovery, third-party extensibility and so on. ISVs can add their own proprietary data in the file format and then access that data through the object model.

Brian Jones has more information about this format, including links to The Microsoft Office Open XML Formats: New File Formats for "Office 12" and The Microsoft Office Open XML Formats: Preview for Developers. He also has an great interview with Robert Scoble. (You know, Channel 9 is really like Microsoft’s own public TV channel. It’s the first place everyone goes to after a major announcement.)

The ZIP idea was really smart. The ZIP file compression adds hierarchical data storage, well-documented format, and CRC checking to Office documents. Being a format from the 1980s, any patent infringement claims should have long expired. It also makes it easier to swap individual parts by deleting and replacing parts. Sections of the document are broken up as separate files in the ZIP files. Binary objects (pictures, for example) are stored as separate binary files, rather than embedded XML.

Also, all versions of Office on and after Office 2000 will support reading and writing the new formats. My job of converting to and from Office documents with my application has just been made unbelievably easier.


I also looked into the licensing agreement. The goals of the much-maligned Microsoft Office XML file format patents become very clear. Unlike earlier file format (DOC and RTF), Microsoft could not prevent another company from extending the format or claiming it as their own. Copyright and trade secret laws might not be strong enough. However, a patent, which has a very narrow applicability to Microsoft’s own formats, making it hard to refute, gives Microsoft complete control for the next 20 years.   

I once thought what would happen if someone appropriated the Rich Text Format as his document format and add his own extensions to the format. He would get instant compatibility with every major application. Furthermore, he could name his product to match the RTF acronym, although that could potentially confuse costumers.

RTF is both upwardly and backwardly compatible, so it is quite possible and easily so to extent it unilaterally without breaking other RTF readers. RTF is, in fact, a standard (at least created jointly by Microsoft and Adobe) that has been appropriated and mangled by Microsoft Word through its various releases. It actually had high-fidelity as an alternative text-based Word format until roundtripable HTML became the favorite child in Word 2000, and then RTF became that “other child” that still needed to be supported--and the lessened status showed in its reduced fidelity.






Net Undocumented is a blog about the internals of .NET including Xamarin implementations. Other topics include managed and web languages (C#, C++, Javascript), computer science theory, software engineering and software entrepreneurship.

Social Media