viernes, 23 de diciembre de 2011

About Compatibility of CAT tools

A troubled marriageSince the widespread adoption of CAT tools, their incompatibility has been causing translators and other service providers much headache. In this article we look at the subjects of interoperability and open standards to assess the current state of affairs. We compare a selection of CAT tools, both Free/Open Source (FOSS) and proprietary, in order to provide an at-a-glance view of the potential compatibility of a range of tools. We take this opportunity to invite you to share your experiences.

Interoperability – why we need it?

According to an American psychologist George Levinger, what counts in making a happy marriage is not so much how compatible you are, but how you deal with incompatibility. This apt observation seems very relevant to the discussion about CAT tools and their current ‘marital’ status.

If you are a translator, there is a 90% chance you use CAT tools. And if you do, there is a 100% certainty that you have experienced moments of frustration over the incompatibility of translation tools available on the market.

While SDL Trados is the most commonly used tool, for how many of you is it a tool of choice? How many of you would prefer to have the freedom of selecting a tool best suited to your needs and wallet but without a worry of loosing work because it does not ‘talk’ with the tool at the other end of the chain?

The idea of various CAT tools exchanging vows might seem a little eccentric, but if we think about the legions of people that would benefit from it, it would seem that indeed, this marriage matchmaking makes perfect sense. The interoperability of translation tools would benefit every single person within the translation workflow, and if we imagine full integration with say desktop publishing, authoring and content management tools, then we are into a truly magnificent marriage ceremony of integrated technology. This would allow us to do things in a smarter and more intelligent way, improving processes, saving time and making translators’ work easier, more productive and stress-free.

So here is a vision of an ecosystem – one shared by many in the translation industry, such as TAUS, and a vision increasingly being pursued at a higher level. The EU’s Interoperability Framework, W3C’s Mission, OASIS, LISA’s OSCAR or the industry’s new Interoperability Manifesto are just some examples of efforts towards establishing a truly open, collaborative and interoperable ecosystem, addressing practical needs of the globalization and translation industries. Yes – it all looks like a bit like software application marriage counselling.

So what is needed to successfully deal with incompatibility?

A common language of connectivity and exchange is needed to make full use of global resources. The tools we develop need to ‘talk’ to each other and just as we need translation to overcome language barrier, so we also need a kind of ‘translation’ to overcome the barriers between the different software applications and tools we use.

If we think of nature’s ecosystem of services and what it produces - water, air, timber, nutrients, even electricity. These are all systems so diverse and yet so productive. Why? Because they are interdependent and ultimately share a common architecture. If we think of an atom as a byte of data, then we can see how technology is now allowing this data to change its ‘status’ – to be moulded into different shapes and travel through the IT ecosystem, just as atoms do in nature.

Currently, the common ‘language’ of the Web is based on XML (Extensible Markup Language) which is used for the presentation, communication and storage of data. It therefore makes perfect sense to use XML as a basis for developing standards, adherence to which allow various software applications to ‘talk’ to each other and exchange data. These standards are called open standards and are constantly developed for the purpose of various industries, including translation.

Open standards in the translation industry

The main goal of open standards is to facilitate the widespread interoperability and data exchange among different products or services over the Internet. The idea is that the content should be accessible and consistently displayed across different browsers, regardless of the device used. The same goes for translation – the translated text should be able to travel smoothly across different tools regardless of the format and the nature of the tool used.

For the translation industry such smooth movement of data in the translation workflow is supported by the following open, XML based standards:

TMX (Translation Memory Exchange): Allows the sharing of translation memory data between tools

XLIFF (Localization Interchange File Format): Stores extracted text and carries the data from one step to another in the localization process

TBX (TermBase Exchange): Allows the sharing of term bases and glossaries between tools

SRX (Segmentation Rules Exchange): Supports TMX and provides a standard method to describe segmentation rules that are being exchanged between tools

GMX (Global information Management Metrics Exchange): a three-part standard allowing quantitative measuring of various aspects of a document, e.g. volume - GMX/V, complexity - GMX/C and quality - GMX/Q

xml:tm (XML based text memory): Comprises of author memory and text memory and allows tracking of all changes in the document, recognizing that documents have a life cycle and that at various points in time they need translating. This format ‘remembers’ everything that happened during the life cycle of a document.

PO (Gettext Portable Object format). Though often not regarded as a translation memory format, Gettext PO files are bilingual files that are also used in translation memory processes in the same way translation memories are used.

Adherence to these standards by CAT tool developers would help ensure that all tools are conversant with one another, giving you the freedom to choose from over 60 available Free/Open Source (FOSS) or proprietary systems or to use more than one system for different purposes. Yes – it sounds like a fairy tale – highly unlikely, just like perfect marriages.

Later in this paper we look at a range of tools available on the market today with a view to assessing their potential for interoperability, but first we need to sort out some terminology issues.

The many flavours of ‘free’ and ‘open’

‘Free’ and ‘open’ are words which are increasingly entering our vocabulary in various configurations and sometimes being interpreted as meaning the same thing. They are far from that.

We have the FOSS vs. proprietary systems whereby in a day-to-day language FOSS is often referred to as ‘free’ (i.e. free of charge) and the latter as ‘paid’. However, some proprietary tools are available ‘for free’ (e.g. Google Translator Kit or Wordfast Anywhere).

We also have many different references to ‘open’. We have ‘open’ as in open source, open standards, open interface, etc. The best way to approach this is to think of ‘open’ in relation to what is actually being opened.

So, in open source, a source code is opened, i.e. made available in a human-readable form, rather than in human-unreadable, binary form. This access to the code allows further development, bug fixing etc by anyone who wishes to get involved.

In open standards, a specification for building or developing particular objects or software applications in a way which fosters interoperability is opened, and standards are achieved in a democratic process, by the people involved.

What is opened in open interface is a software ‘boundary’, which then becomes common with other software components and enables connection, interaction, exchanging data and instructions. APIs (Application Programming Interface) are a means to enable such open interfacing/ seamless connectivity between tools.

Add to these three elements (open source/standards/ interface): open to community/collaborative translation and you have the Open Translation Platforms agenda that TAUS has been pursuing for some time now. The whole idea behind this agenda is to help move the industry to a new level of capability; to true interoperability.

By way of comparison, it seems almost unthinkable that the client could force us to use a particular make of mobile phone in order to maintain communication. We choose our phone solely on the basis of personal preference, without worrying whether it will ‘talk’ to other phones or whether it will be able to send and receive data. We quite rightly assume it will. And it does.

So now let’s have a look at the current state of some of the baseline CAT tool to see where they are in terms of commitment to interoperability.

Comparing CAT tools

We looked at 7 FOSS tools, 3 free-proprietary tools, and 7 paid-proprietary tools focusing on their self reported compatibility with open standards, openness to connecting with APIs, and capability to share resources, such as TM and term databases. The list is by no means exhaustive, however it is fairly representative. The chart below demonstrates where each of these 17 tools sit on the interoperability path. It is quite striking that free-proprietary tools fare worst in terms of stated compliance with open standards, notably all miss XLIFF. The industry’s most used tool, SDL Trados is one of the worst performers among paid systems. Pure play XML based tools, such as XTM, come out on top. We have purposefully omitted tools that have integrated with the TAUS Data Association supercloud, namely GlobalSight, MultiTrans and the Translator Workspace, from the chart and table so as not to be accused of favourable bias towards these tools. Clearly, these tools vary in their richness of features. This is not addressed by this article.

Image #1: Openness of CAT tools

Click here to see a Wikipedia page which details the file formats many CAT tools can work with.

A few key findings:
  1. All the tools support TMX, which is great news and means that potentially, translation memories could be allowed to flow freely across all the tools. The problem is that not all the tools have collaborative features so while it is possible to leverage a memory created on a different tool, it is not always possible to share memories in real time.
  2. Over two-thirds of the tools in our sample reportedly support XLIFF, which if done with a view to true compliance means that a great number of tools can now capture all information needed for localization, such as localizable objects in source and target languages (e.g. word strings), supplementary information (e.g. glossaries), administrative information (e.g. workflow data) etc in one single file and exchange it with other tools without any loss.
  3. SRX and TBX are supported by about 20% of the FOSS tools and 70% of proprietary tools.
  4. Most have APIs (Application Programming Interface) available which means they can be integrated into other software applications.
There is clear evidence that many CAT Tools are on the road to interoperability. The XML-driven approach seems to be becoming the current standard. This is all great news for translation professionals.

However, it has to be said that we are indeed only at the beginning of ‘the journey towards interoperability’ and some parts of it still have to be done ‘on foot’ and perhaps some areas are still inaccessible.

A word of caution; stated compliance is not the same as compliance. There will always be issues surrounding the standards themselves. The reasons for non-compliance are many; just to voice some opinions from the industry:
  • open standards are too permissive and the way they are designed allows for different ‘flavours’ of standards with their own idiosyncratic quirks which makes integration not so straightforward.
  • poor quality engineering is sometimes to blame for the lack of interoperability, especially where compliance is declared ‘on paper’.
  • the process of achieving compliance can be very lengthy and complicated for some systems, requiring major re-engineering works to the source code. This can, quite considerably, slow up the traffic on the road to interoperability.

What can be done to foster interoperability?

There are a few things translators can do to speed up the process in order to secure a better working environment.

Start requesting documents in open formats that are based on underlying open standards. Just imagine that from now on you’d be getting your source text in ONE format – XLIFF. No more doc, html, pdf, InDesign... – just one format which can be opened in a tool of your choice – free or paid, open source or proprietary, whatever suits your palate.
Give feedback to your tools provider. The tenets of open innovation have brought about a shift in the way products are being developed. ‘Crowdsourcing’, ‘co-creation’, ‘user-centred approach’ are just few buzzwords pointing to the fact that the ‘wisdom’ of users is being increasingly used. If things cannot be remedied in a current version, they might be addressed in the next release of the system. Some developers have even indicated that certain features can be added depending on demand.

We also invite you to use the comments feature below to share your experiences of CAT tool interoperability. Are you a researcher who has done experiments on the efficiency of data exchange between CAT tools? If so, what did you learn? Have you recently started using a new CAT tool? If so, how efficiently did your new tool make use of existing resources, which had been created/ stored in the old tool?

Thanks to everyone who contributed to this article. A special thanks to Marco Cevoli at Qabiria for his time and dedication in putting the comparative table together.

Source: Taus

2 comentarios:

David del Bass dijo...

Esta muy bien el blog, no lo conocía hasta ahora, me pasaré más a menudo a leerlo. Aprovecho para felicitarte el 2012, un saludo!!

Mariela Parma dijo...

Hola!!!! Estoy en la blogoteca.20minutos. y ya quedan apenas 2 días para el cierre!! Podrías dar tu opinión y si puedes votar, en buena hora!! Es la primera vez y quiero hacer conocer el blog!!!
te espero por
Espero te guste!!
Muchas graciasss