Creative Commons License
Unless otherwise stated, all text in this blog is licensed under a Creative Commons Attribution-NonCommercial 2.5 License.


Showing posts with label xml. Show all posts
Showing posts with label xml. Show all posts

Tuesday, October 02, 2007

OOXML from the open source perspective

So then Microsoft decided they were going to do an open standard too, and guess what: it is a .zip file and it's got XML streams inside it. But, having said that, it has been difficult in the past to do binary file format interoperability. You can make many good arguments that it is not a benefit to have one company totally dominating the market. You need some sort of file format interoperability.

and

Isn't one file format (such as ODF) better than two? Surely the weakness of having many is the confusion it creates?
Well, yes, and it should be ODF. In an ideal world... yes, a single file format that was a superset of features and so on would be ideal, but it is very difficult to even conceive of that happening. There is just such a lot of vested business interest in this sphere. It is just very difficult to do anything technical. I just can't see anything like that happening.

Michael Meeks (part of the OpenOffice.org team at Novell)

in conversation with ZDNet.co.uk

Thursday, September 27, 2007

ODT and DOCX - are they human readable?

[Note: a revised version of this post can now be found on techwhimsy.com]

One of the supposed benefits of XML is that documents produced in this format are able to be opened as a text file and read by normal people, allowing the content to be recovered, even if the formatting was unavailable After discussing the various merits of the Open Document Format (ODF) and Office Open XML (OOXML) formats (click here to read the earlier post), I was left wondering just how human readable either format was.

I created a simple document in both Open Office as .odt (Open Document Text) and in MS Office 2007 as .docx that had a heading, some paragraphs, an unordered list and an ordered list. I used the Loren Ipsum generator that can be found at Lipsum.

(click on images for larger versions)











.odt is on the left and .docx on the right

To start off with, I opened both documents up in Wordpad to see what they looked like. Not at all human readable.









.odt on the left and .docx on the right



A quick trawl through a Google search revealed that .odt is a container format that compresses all the relevant file parts in to one file. I changed the file extension from .odt to .zip and opened it up to have a look.









What worked for one format might work for the other. I took a punt, changed the file extension from .docx to .zip, held my breath, crossed my fingers, closed my eyes and double-clicked...








...and discovered that in .docx, the goodies are there, albeit buried a little deeper.

Both .odt and .docx are human readable, after a fashion. If for some reason in the distant (or not-so-distant) future either format is unreadable in its container form, with some effort the data could be extracted. It may even be possible to extract large parts of the formatting, but that's beyond my ability to assess.

In my assessment, .odt comes out ahead slightly in the human readable stakes: it isn't buried quite so deep and comes with less additional XML-related formatting and overhead. As to which is the better format overall, I'll leave that as an exercise for the reader (although I wish I could create .odt inside of Office 2007 - I do love the new Office user interface).

Tuesday, September 25, 2007

Open Warfare: OOXML stumbles while ODF continues to rise

Google Trends comparison between ODF and OOXML for 2007
Google Trends ODF vs OOXML for 2007


While Microsoft Office is the undisputed king of the office suite marketplace, the open source OpenOffice.org (OOo) has been worrying at its heels for some time. In recent months controversy has arisen over the accessibility of the XML-based file formats of the competing products - OOo's Open Document Format (ODF) and Microsofts Office Open XML (OOXML).

ODF was accepted as an international standard (click here for my earlier post on the issue) by ISO back in late 2006, giving it much needed credibity as a leading open format for documents. OOXML has also sought ISO approval but was unsuccessful in its attempt earlier this month, amidst suspicion of questionable activities of Microsoft representatives.

Why are open formats necessary?

Open formats perform an important function in the preservation of the information in documents, particularly for archival purposes. An archive is useless if it is stored in a file format that nothing read in 100 or 50 or even 20 years time. Readability is especially important for the storage of public records where there is a need for government activities to be publicly accessible in future years to future generations.

Although Microsoft's .doc Word format is nearly ubiquitous, it is far from a perfect solution. It is not uncommon for the format to become broken, unreadable and not backwards compatible between major releases of Office.

Is this the end for Office?

Defeating Microsoft should not be the main focus for OOo and the ODF, although clawing back some market share is an admirable goal and a worthy one to strive for. the user interface for Office is still a long way ahead of the its open source alternative, and in my opinion the gulf between the two has become wider with the revamped interface used in Office 2007. The differences between the two interfaces reflects the benefits that the support of a large corporation backed by massive reserves of cash and talent can bring.

ODF vs OOXML should not be an ideological battle between free and libre open source software and Microsoft. The best outcome for users is for ODF to be accepted by Microsoft as the international standard that it is and be introduced as a file export option within Office itself. Such an outcome would enable users to enjoy the best of both worlds - an excellent and time-tested user interface that also enables them to produce documents in an open and future-proof file format.

It would be a win-win situation for all consumers and ultimately, isn't that what this should be all about?