Since
the widespread adoption of CAT tools, their incompatibility has been
causing translators and other service providers much headache. In this
article we look at the subjects of interoperability and open standards
to assess the current state of affairs. We compare a selection of CAT
tools, both Free/Open Source (FOSS) and proprietary, in order to provide
an at-a-glance view of the potential compatibility of a range of tools.
We take this opportunity to invite you to share your experiences.
Interoperability – why we need it?
According to an American psychologist George Levinger, what counts in
making a happy marriage is not so much how compatible you are, but how
you deal with incompatibility. This apt observation seems very relevant
to the discussion about CAT tools and their current ‘marital’ status.
If you are a translator, there is a 90% chance you use CAT tools. And
if you do, there is a 100% certainty that you have experienced moments
of frustration over the incompatibility of translation tools available
on the market.
While SDL Trados is the most commonly used tool, for how many of you is it a tool of choice? How many of you would prefer to have the freedom of selecting a tool best suited to your needs and wallet but without a worry of loosing work because it does not ‘talk’ with the tool at the other end of the chain?
The idea of various CAT tools exchanging vows might seem a little
eccentric, but if we think about the legions of people that would
benefit from it, it would seem that indeed, this marriage matchmaking
makes perfect sense. The interoperability of translation tools would
benefit every single person within the translation workflow, and if we
imagine full integration with say desktop publishing, authoring and
content management tools, then we are into a truly magnificent marriage
ceremony of integrated technology. This would allow us to do things in a
smarter and more intelligent way, improving processes, saving time and
making translators’ work easier, more productive and stress-free.
So here is a vision of an ecosystem – one shared by many in the
translation industry, such as TAUS, and a vision increasingly being
pursued at a higher level. The EU’s Interoperability Framework, W3C’s
Mission, OASIS, LISA’s OSCAR or the industry’s new Interoperability
Manifesto are just some examples of efforts towards establishing a truly
open, collaborative and interoperable ecosystem, addressing practical
needs of the globalization and translation industries. Yes – it all
looks like a bit like software application marriage counselling.
So what is needed to successfully deal with incompatibility?
A common language of connectivity and exchange is needed to make full
use of global resources. The tools we develop need to ‘talk’ to each
other and just as we need translation to overcome language barrier, so
we also need a kind of ‘translation’ to overcome the barriers between
the different software applications and tools we use.
If we think of nature’s ecosystem of services and what it produces -
water, air, timber, nutrients, even electricity. These are all systems
so diverse and yet so productive. Why? Because they are interdependent
and ultimately share a common architecture. If we think of an atom as a
byte of data, then we can see how technology is now allowing this data
to change its ‘status’ – to be moulded into different shapes and travel
through the IT ecosystem, just as atoms do in nature.
Currently, the common ‘language’ of the Web is based on XML
(Extensible Markup Language) which is used for the presentation,
communication and storage of data. It therefore makes perfect sense to
use XML as a basis for developing standards, adherence
to which allow various software applications to ‘talk’ to each other and
exchange data. These standards are called open standards and are constantly developed for the purpose of various industries, including translation.
Open standards in the translation industry
The main goal of open standards is to facilitate the widespread
interoperability and data exchange among different products or services
over the Internet. The idea is that the content should be accessible and
consistently displayed across different browsers, regardless of the
device used. The same goes for translation – the translated text should
be able to travel smoothly across different tools regardless of the
format and the nature of the tool used.
For the translation industry such smooth movement of data in the
translation workflow is supported by the following open, XML based
standards:
TMX (Translation Memory Exchange): Allows the sharing of translation memory data between tools
XLIFF (Localization Interchange File Format): Stores extracted text and carries the data from one step to another in the localization process
TBX (TermBase Exchange): Allows the sharing of term bases and glossaries between tools
SRX (Segmentation Rules Exchange): Supports TMX and provides a standard method to describe segmentation rules that are being exchanged between tools
GMX (Global information Management Metrics Exchange): a three-part standard allowing quantitative measuring of various aspects of a document, e.g. volume - GMX/V, complexity - GMX/C and quality - GMX/Q
xml:tm (XML based text memory): Comprises of author memory and text memory and allows tracking of all changes in the document, recognizing that documents have a life cycle and that at various points in time they need translating. This format ‘remembers’ everything that happened during the life cycle of a document.
PO (Gettext Portable Object format). Though often not regarded as a translation memory format, Gettext PO files are bilingual files that are also used in translation memory processes in the same way translation memories are used.
Adherence to these standards by CAT tool developers would help ensure
that all tools are conversant with one another, giving you the freedom
to choose from over 60 available Free/Open Source (FOSS) or proprietary
systems or to use more than one system for different purposes. Yes – it
sounds like a fairy tale – highly unlikely, just like perfect marriages.
Later in this paper we look at a range of tools available on the
market today with a view to assessing their potential for
interoperability, but first we need to sort out some terminology issues.
The many flavours of ‘free’ and ‘open’
‘Free’ and ‘open’ are words which are increasingly entering our
vocabulary in various configurations and sometimes being interpreted as
meaning the same thing. They are far from that.
We have the FOSS vs. proprietary systems whereby in a day-to-day language FOSS is often referred to as ‘free’ (i.e. free of charge) and the latter as ‘paid’. However, some proprietary tools are available ‘for free’ (e.g. Google Translator Kit or Wordfast Anywhere).
We also have many different references to ‘open’. We have ‘open’ as
in open source, open standards, open interface, etc. The best way to
approach this is to think of ‘open’ in relation to what is actually
being opened.
So, in open source, a source code is opened, i.e.
made available in a human-readable form, rather than in
human-unreadable, binary form. This access to the code allows further
development, bug fixing etc by anyone who wishes to get involved.
In open standards, a specification for building or
developing particular objects or software applications in a way which
fosters interoperability is opened, and standards are achieved in a
democratic process, by the people involved.
What is opened in open interface is a software
‘boundary’, which then becomes common with other software components and
enables connection, interaction, exchanging data and instructions. APIs
(Application Programming Interface) are a means to enable such open
interfacing/ seamless connectivity between tools.
Add to these three elements (open source/standards/ interface): open to community/collaborative translation and you have the Open Translation Platforms agenda
that TAUS has been pursuing for some time now. The whole idea behind
this agenda is to help move the industry to a new level of capability;
to true interoperability.
By way of comparison, it seems almost unthinkable that the client
could force us to use a particular make of mobile phone in order to
maintain communication. We choose our phone solely on the basis of
personal preference, without worrying whether it will ‘talk’ to other
phones or whether it will be able to send and receive data. We quite
rightly assume it will. And it does.
So now let’s have a look at the current state of some of the baseline
CAT tool to see where they are in terms of commitment to
interoperability.
Comparing CAT tools
We looked at 7 FOSS tools, 3 free-proprietary tools, and 7
paid-proprietary tools focusing on their self reported compatibility
with open standards, openness to connecting with APIs, and capability to
share resources, such as TM and term databases. The list is by no means
exhaustive, however it is fairly representative. The chart below
demonstrates where each of these 17 tools sit on the interoperability
path. It is quite striking that free-proprietary tools fare worst in
terms of stated compliance with open standards, notably all miss XLIFF.
The industry’s most used tool, SDL Trados is one of the worst performers
among paid systems. Pure play XML based tools, such as XTM, come out on
top. We have purposefully omitted tools that have integrated with the
TAUS Data Association supercloud, namely GlobalSight, MultiTrans and the
Translator Workspace, from the chart and table so as not to be accused
of favourable bias towards these tools. Clearly, these tools vary in
their richness of features. This is not addressed by this article.
Click here to see a Wikipedia page which details the file formats many CAT tools can work with.
A few key findings:
- All the tools support TMX, which is great news and means that potentially, translation memories could be allowed to flow freely across all the tools. The problem is that not all the tools have collaborative features so while it is possible to leverage a memory created on a different tool, it is not always possible to share memories in real time.
- Over two-thirds of the tools in our sample reportedly support XLIFF, which if done with a view to true compliance means that a great number of tools can now capture all information needed for localization, such as localizable objects in source and target languages (e.g. word strings), supplementary information (e.g. glossaries), administrative information (e.g. workflow data) etc in one single file and exchange it with other tools without any loss.
- SRX and TBX are supported by about 20% of the FOSS tools and 70% of proprietary tools.
- Most have APIs (Application Programming Interface) available which means they can be integrated into other software applications.
There is clear evidence that many CAT Tools are on the road to
interoperability. The XML-driven approach seems to be becoming the
current standard. This is all great news for translation professionals.
However, it has to be said that we are indeed only at the beginning of ‘the journey towards interoperability’ and some parts of it still have to be done ‘on foot’ and perhaps some areas are still inaccessible.
A word of caution; stated compliance is not the same as compliance. There will always be issues surrounding the standards themselves. The reasons for non-compliance are many; just to voice some opinions from the industry:
- open standards are too permissive and the way they are designed allows for different ‘flavours’ of standards with their own idiosyncratic quirks which makes integration not so straightforward.
- poor quality engineering is sometimes to blame for the lack of interoperability, especially where compliance is declared ‘on paper’.
- the process of achieving compliance can be very lengthy and complicated for some systems, requiring major re-engineering works to the source code. This can, quite considerably, slow up the traffic on the road to interoperability.
What can be done to foster interoperability?
There are a few things translators can do to speed up the process in order to secure a better working environment.
Start requesting documents in open formats that are
based on underlying open standards. Just imagine that from now on you’d
be getting your source text in ONE format – XLIFF. No more doc, html,
pdf, InDesign... – just one format which can be opened in a tool of your
choice – free or paid, open source or proprietary, whatever suits your
palate.
Give feedback to your tools provider. The tenets of
open innovation have brought about a shift in the way products are being
developed. ‘Crowdsourcing’, ‘co-creation’, ‘user-centred approach’ are
just few buzzwords pointing to the fact that the ‘wisdom’ of users is
being increasingly used. If things cannot be remedied in a current
version, they might be addressed in the next release of the system. Some
developers have even indicated that certain features can be added
depending on demand.
We also invite you to use the comments feature below to share your experiences of CAT tool interoperability. Are
you a researcher who has done experiments on the efficiency of data
exchange between CAT tools? If so, what did you learn? Have you recently
started using a new CAT tool? If so, how efficiently did your new tool
make use of existing resources, which had been created/ stored in the
old tool?
Thanks to everyone who contributed to this article. A special thanks to Marco Cevoli at Qabiria for his time and dedication in putting the comparative table together.
Source: Taus