viernes, 23 de diciembre de 2011

About Compatibility of CAT tools

A troubled marriageSince the widespread adoption of CAT tools, their incompatibility has been causing translators and other service providers much headache. In this article we look at the subjects of interoperability and open standards to assess the current state of affairs. We compare a selection of CAT tools, both Free/Open Source (FOSS) and proprietary, in order to provide an at-a-glance view of the potential compatibility of a range of tools. We take this opportunity to invite you to share your experiences.

Interoperability – why we need it?

According to an American psychologist George Levinger, what counts in making a happy marriage is not so much how compatible you are, but how you deal with incompatibility. This apt observation seems very relevant to the discussion about CAT tools and their current ‘marital’ status.

If you are a translator, there is a 90% chance you use CAT tools. And if you do, there is a 100% certainty that you have experienced moments of frustration over the incompatibility of translation tools available on the market.

While SDL Trados is the most commonly used tool, for how many of you is it a tool of choice? How many of you would prefer to have the freedom of selecting a tool best suited to your needs and wallet but without a worry of loosing work because it does not ‘talk’ with the tool at the other end of the chain?

The idea of various CAT tools exchanging vows might seem a little eccentric, but if we think about the legions of people that would benefit from it, it would seem that indeed, this marriage matchmaking makes perfect sense. The interoperability of translation tools would benefit every single person within the translation workflow, and if we imagine full integration with say desktop publishing, authoring and content management tools, then we are into a truly magnificent marriage ceremony of integrated technology. This would allow us to do things in a smarter and more intelligent way, improving processes, saving time and making translators’ work easier, more productive and stress-free.

So here is a vision of an ecosystem – one shared by many in the translation industry, such as TAUS, and a vision increasingly being pursued at a higher level. The EU’s Interoperability Framework, W3C’s Mission, OASIS, LISA’s OSCAR or the industry’s new Interoperability Manifesto are just some examples of efforts towards establishing a truly open, collaborative and interoperable ecosystem, addressing practical needs of the globalization and translation industries. Yes – it all looks like a bit like software application marriage counselling.

So what is needed to successfully deal with incompatibility?

A common language of connectivity and exchange is needed to make full use of global resources. The tools we develop need to ‘talk’ to each other and just as we need translation to overcome language barrier, so we also need a kind of ‘translation’ to overcome the barriers between the different software applications and tools we use.

If we think of nature’s ecosystem of services and what it produces - water, air, timber, nutrients, even electricity. These are all systems so diverse and yet so productive. Why? Because they are interdependent and ultimately share a common architecture. If we think of an atom as a byte of data, then we can see how technology is now allowing this data to change its ‘status’ – to be moulded into different shapes and travel through the IT ecosystem, just as atoms do in nature.

Currently, the common ‘language’ of the Web is based on XML (Extensible Markup Language) which is used for the presentation, communication and storage of data. It therefore makes perfect sense to use XML as a basis for developing standards, adherence to which allow various software applications to ‘talk’ to each other and exchange data. These standards are called open standards and are constantly developed for the purpose of various industries, including translation.

Open standards in the translation industry

The main goal of open standards is to facilitate the widespread interoperability and data exchange among different products or services over the Internet. The idea is that the content should be accessible and consistently displayed across different browsers, regardless of the device used. The same goes for translation – the translated text should be able to travel smoothly across different tools regardless of the format and the nature of the tool used.

For the translation industry such smooth movement of data in the translation workflow is supported by the following open, XML based standards:

TMX (Translation Memory Exchange): Allows the sharing of translation memory data between tools

XLIFF (Localization Interchange File Format): Stores extracted text and carries the data from one step to another in the localization process

TBX (TermBase Exchange): Allows the sharing of term bases and glossaries between tools

SRX (Segmentation Rules Exchange): Supports TMX and provides a standard method to describe segmentation rules that are being exchanged between tools

GMX (Global information Management Metrics Exchange): a three-part standard allowing quantitative measuring of various aspects of a document, e.g. volume - GMX/V, complexity - GMX/C and quality - GMX/Q

xml:tm (XML based text memory): Comprises of author memory and text memory and allows tracking of all changes in the document, recognizing that documents have a life cycle and that at various points in time they need translating. This format ‘remembers’ everything that happened during the life cycle of a document.

PO (Gettext Portable Object format). Though often not regarded as a translation memory format, Gettext PO files are bilingual files that are also used in translation memory processes in the same way translation memories are used.

Adherence to these standards by CAT tool developers would help ensure that all tools are conversant with one another, giving you the freedom to choose from over 60 available Free/Open Source (FOSS) or proprietary systems or to use more than one system for different purposes. Yes – it sounds like a fairy tale – highly unlikely, just like perfect marriages.

Later in this paper we look at a range of tools available on the market today with a view to assessing their potential for interoperability, but first we need to sort out some terminology issues.

The many flavours of ‘free’ and ‘open’

‘Free’ and ‘open’ are words which are increasingly entering our vocabulary in various configurations and sometimes being interpreted as meaning the same thing. They are far from that.

We have the FOSS vs. proprietary systems whereby in a day-to-day language FOSS is often referred to as ‘free’ (i.e. free of charge) and the latter as ‘paid’. However, some proprietary tools are available ‘for free’ (e.g. Google Translator Kit or Wordfast Anywhere).

We also have many different references to ‘open’. We have ‘open’ as in open source, open standards, open interface, etc. The best way to approach this is to think of ‘open’ in relation to what is actually being opened.

So, in open source, a source code is opened, i.e. made available in a human-readable form, rather than in human-unreadable, binary form. This access to the code allows further development, bug fixing etc by anyone who wishes to get involved.

In open standards, a specification for building or developing particular objects or software applications in a way which fosters interoperability is opened, and standards are achieved in a democratic process, by the people involved.

What is opened in open interface is a software ‘boundary’, which then becomes common with other software components and enables connection, interaction, exchanging data and instructions. APIs (Application Programming Interface) are a means to enable such open interfacing/ seamless connectivity between tools.

Add to these three elements (open source/standards/ interface): open to community/collaborative translation and you have the Open Translation Platforms agenda that TAUS has been pursuing for some time now. The whole idea behind this agenda is to help move the industry to a new level of capability; to true interoperability.

By way of comparison, it seems almost unthinkable that the client could force us to use a particular make of mobile phone in order to maintain communication. We choose our phone solely on the basis of personal preference, without worrying whether it will ‘talk’ to other phones or whether it will be able to send and receive data. We quite rightly assume it will. And it does.

So now let’s have a look at the current state of some of the baseline CAT tool to see where they are in terms of commitment to interoperability.

Comparing CAT tools

We looked at 7 FOSS tools, 3 free-proprietary tools, and 7 paid-proprietary tools focusing on their self reported compatibility with open standards, openness to connecting with APIs, and capability to share resources, such as TM and term databases. The list is by no means exhaustive, however it is fairly representative. The chart below demonstrates where each of these 17 tools sit on the interoperability path. It is quite striking that free-proprietary tools fare worst in terms of stated compliance with open standards, notably all miss XLIFF. The industry’s most used tool, SDL Trados is one of the worst performers among paid systems. Pure play XML based tools, such as XTM, come out on top. We have purposefully omitted tools that have integrated with the TAUS Data Association supercloud, namely GlobalSight, MultiTrans and the Translator Workspace, from the chart and table so as not to be accused of favourable bias towards these tools. Clearly, these tools vary in their richness of features. This is not addressed by this article.

Image #1: Openness of CAT tools

Click here to see a Wikipedia page which details the file formats many CAT tools can work with.

A few key findings:
  1. All the tools support TMX, which is great news and means that potentially, translation memories could be allowed to flow freely across all the tools. The problem is that not all the tools have collaborative features so while it is possible to leverage a memory created on a different tool, it is not always possible to share memories in real time.
  2. Over two-thirds of the tools in our sample reportedly support XLIFF, which if done with a view to true compliance means that a great number of tools can now capture all information needed for localization, such as localizable objects in source and target languages (e.g. word strings), supplementary information (e.g. glossaries), administrative information (e.g. workflow data) etc in one single file and exchange it with other tools without any loss.
  3. SRX and TBX are supported by about 20% of the FOSS tools and 70% of proprietary tools.
  4. Most have APIs (Application Programming Interface) available which means they can be integrated into other software applications.
There is clear evidence that many CAT Tools are on the road to interoperability. The XML-driven approach seems to be becoming the current standard. This is all great news for translation professionals.

However, it has to be said that we are indeed only at the beginning of ‘the journey towards interoperability’ and some parts of it still have to be done ‘on foot’ and perhaps some areas are still inaccessible.

A word of caution; stated compliance is not the same as compliance. There will always be issues surrounding the standards themselves. The reasons for non-compliance are many; just to voice some opinions from the industry:
  • open standards are too permissive and the way they are designed allows for different ‘flavours’ of standards with their own idiosyncratic quirks which makes integration not so straightforward.
  • poor quality engineering is sometimes to blame for the lack of interoperability, especially where compliance is declared ‘on paper’.
  • the process of achieving compliance can be very lengthy and complicated for some systems, requiring major re-engineering works to the source code. This can, quite considerably, slow up the traffic on the road to interoperability.

What can be done to foster interoperability?

There are a few things translators can do to speed up the process in order to secure a better working environment.

Start requesting documents in open formats that are based on underlying open standards. Just imagine that from now on you’d be getting your source text in ONE format – XLIFF. No more doc, html, pdf, InDesign... – just one format which can be opened in a tool of your choice – free or paid, open source or proprietary, whatever suits your palate.
Give feedback to your tools provider. The tenets of open innovation have brought about a shift in the way products are being developed. ‘Crowdsourcing’, ‘co-creation’, ‘user-centred approach’ are just few buzzwords pointing to the fact that the ‘wisdom’ of users is being increasingly used. If things cannot be remedied in a current version, they might be addressed in the next release of the system. Some developers have even indicated that certain features can be added depending on demand.

We also invite you to use the comments feature below to share your experiences of CAT tool interoperability. Are you a researcher who has done experiments on the efficiency of data exchange between CAT tools? If so, what did you learn? Have you recently started using a new CAT tool? If so, how efficiently did your new tool make use of existing resources, which had been created/ stored in the old tool?

Thanks to everyone who contributed to this article. A special thanks to Marco Cevoli at Qabiria for his time and dedication in putting the comparative table together.

Source: Taus

jueves, 22 de diciembre de 2011

The future for translators looks bright, but they will have to reinvent the profession first

Seven predictions and a survey presented at the 19th FIT Conference, San Francisco, August 2011.

Translators in the 21st century find themselves in a difficult position.On the one hand there is a steadily growing demand for translation as a result of increasing global trade and communication generally. On the other hand it becomes harder and harder for the professional translator to meet this demand. Delivery times grow shorter and prices go down.

Technology is often thought of as an answer to this kind of pressure. But along with the technology come many new challenges. It is simply impossible for a translator who is trained in the language arts to keep up with the technology. And if she tries, frustration grows when she finds out that translation tools do not really work together very well. (See report Individual translators and data exchange standards.)

Then there are the economics. As the owner of a small business, translators must weigh the return-on-investment on time and money very carefully. Tools do not come for free and every new tool takes time to be mastered. What if these same tools – or machine translation – one day take over the job of human translators, as many of our colleagues fear. You might prefer to live on another planet, or at least work in another profession.

For the 19th FIT Conference held in San Francisco, 1-4 August 2011, TAUS ran a survey among the translators attending the conference. This article references a summary of the survey, and then makes seven predictions as a follow up to the keynote I gave to close the FIT event. The conclusion: the future for translators looks bright, but they will have to reinvent the profession first.

Crisis. What crisis?

In the aftermath of the 2008 financial crisis, sixty-four (37%) of the survey respondents reported that translation rates continue to be under pressure. There seems to be a slight decline in translation volume, while the palette of languages seems to be broadening slightly. Thirty-seven respondents (21%) see business continuing as usual, while respectively 12% and 10% of them see opportunities for automation and innovation in the currently unstable market.

Which of the following technologies and/or innovations will translators apply in the coming two years? Sixty percent of the respondents say ‘no’ to machine translation, while 19% are already using it, and 21% expect they will use MT within the next two years. The main concerns about MT are the poor quality of MT output (76%) and the poor quality of source documents (54%). Those who look at MT on the bright side see cost reduction as the greatest benefit (39%) and the possibility of real-time delivery of translation as a secondary benefit (35%).

A majority of the respondents are interested in sharing translation memories and terminology: 35% already do so and 39% expect to be sharing language data within two years. However, another much larger poll by of 1,000 translators indicates that 49% would not consider sharing their translation memories. Translators are concerned about ownership of TMs and their relevance to the job at hand. But they do see the benefits of terminology searches of massive TM resources and the productivity gains these bring.

The future looks bright, but …

… change is the name of the game. And reinventing the profession is extremely hard if your days are spent just getting the jobs done and trying to make a modest living. Yet, for the first time in the history of the planet, translation is a really strategic activity. Thanks to Google Translate, Yahoo! Babelfish and Microsoft Bing, every soul on our planet now knows what translation means.

Hundreds of millions people press the translate button every day which makes them realize how difficult it is to get a good, accurate translation. As professionals we must realize that our community is far too small (just 250,000 or so professional translators in a world of 6,000 languages?) to serve the needs of seven billion citizens.

We are only scratching the surface. As professional translators – and as a global translation industry – our mission is to help the world communicate better. (That sounds better than being a lawyer or a banker, right?) For we now have the means to deliver on that mission. We simply need to find a way to do it properly. Here is how TAUS sees the future in seven predictions.

1. MT is here to stay
Let’s face it: machine translation will never be perfect. Every speaker of a language has the right to introduce new words, give existing words new meanings and change the spelling and grammar of his language. The point is: that’s what people do every day – witness Twitter or online chat, popular songs or political revolutions.
Computers just cannot keep up with these evolving nuances and associations in hundreds of domains and linguaspheres created by speakers of just one language. Yet, MT for all its mechanical faults is here to stay. Why? For the simple reason that we humans just cannot deliver enough translations in real-time.

Two other factors will also influence the rapid growth of MT. First, MT is getting better and better as we keep feeding the engines with human translated sentences to improve their domain knowledge and we keep tweaking the rules to improve the word order and forms. Second, a new generation of users are growing up, they are more forgiving, and open to self-service. Users may even step in and offer better terminology and forms of expression as a way to help others and themselves.

MT is here to stay and will be called “translation”. It will be embedded on every website, mobile and car app. Translation will become a utility, just like electricity, water and Internet: a basic resource and a basic human right.

2. High-quality translation will gain recognition
As machine translation becomes so universally available, it is clear that there isn’t just one single translation of a text that fits all. To differentiate their product offerings and appeal to specific customer groups, buyers will recognize the need for high-quality translation - call it personalization, transcreation or hyper-localization. This means that, machines will not replace human translators.

On the contrary, non-perfect MT output will stimulate the need for high-quality translation in a broad range of communication situations. The challenge we face as an industry is to agree on the criteria and the measurements for the level of quality that is needed for each situation. Sometimes MT is simply not an option. Sometimes MT is the only option.

3. Post-editing will come and go
Information travels fast and loses its value quickly. This is especially true for news, entertainment, online shopping and customer support content, but increasingly also for business-to-business and government information.

There is a fundamental shift from static “cast in stone” content to dynamic “on the fly” content. Instead of one or two releases per year, companies are shipping product updates on a weekly if not daily basis. And consumers, citizens and patients are increasingly sharing their reviews, tips and tricks in user blogs and social media in almost real time. Any chunk of information may be relevant and interesting to someone somewhere.

The key attraction of MT in this new information age is that it can deliver real-time translation to meet these changes. Potential cost reduction is only a secondary benefit. And the widespread fear that all human translators will soon be downgraded to mere post-editors of MT output is ungrounded.

Why? Well, in the next few years post-editing will grow quickly, but then we will see it diminish. But if there is no time for translation, then there is time for post-editing either. Real-time is real-time, right? In any case, MT technology will get better, using machine intelligence to learn from its mistakes and not make them again.

Translators who choose to work with computers will customize and personalize MT engines to specific tasks, customers and domains, rather than do stupid, repetitive error fixing. They will be promoted to ‘language quality advisors’ if you like.

4. Translators win when supply chains get shorter
More so than most other industries, the translation industry consists of a complex cascade of suppliers. There may be three or four levels between the translator and the end-user: translation agency, global multi-language vendor, corporate translation department and often an external quality reviewer or subject matter expert.

All these functions add a cost to translation but are they adding any real value in proportion to that cost? Tasks are often replicated and functions overlap. Disintermediation (i.e., ‘cutting out the middleman’) hasn’t really bitten into the translation industry yet as it has in the travel and banking industries, for example. But change is on the way, under pressure from the overarching need to translate more words into more languages.

Corporate and government buyers will analyze their supply chains to reduce their costs, and functions such as project management, quality assurance, vendor selection and translation memory management, will probably be streamlined, simplified or shared. Yet there will be no question about the critical role of the translator at the end of the chain.
Even though MT will be used to translate content streams requiring real-time translation, there will always be a need for a professional translator to tell good from bad language in the communication process.

5. The list of languages keeps growing
As global business is shifting from an export mentality to a world of open trading on a flat playing field, the nature of publishing and communications is also changing fundamentally.

In the old 20th century model the global manufacturer and publisher used to push information out to the world. They would select their markets, pick their most important language communities and translate their own instructions for use, brochures and web pages.

They would probably start with four to six languages and gradually add more languages if the markets prove to be worthwhile. In the new 21st century model, companies are realizing that their customers are not sitting there waiting for the information to be pushed out by manufacturers and publishers.

They are browsing the Internet and pulling down information wherever they find it. And if they can’t find it, they write their own reviews and comments that yet others may then translate to help their local peers. In the old world, content was owned by publishers; in the new world content is shared and earned.

In this radically changing environment, the range of languages for content is constantly growing. Successful global companies need to facilitate communications in a hundred-or more languages instead of the old standard set of seven or at the most twenty.

Translators in many more countries will benefit from this “democratization” of globalization.

6. Sharing data becomes the norm
Our concept of a ‘translation memory’ is about to change. Translation memories and translation memory tools have long been cultivated as our proprietary productivity weapon, perhaps offering a competitive edge in an environment where one fifth of professional translators (according to a recent poll) still don’t even use translation memories.

Yet, we have now reached the limits of potential productivity gains, and, let’s face it, translation memory technology itself – in its current and mostly used form – is no longer state-of-the-art. Most translation memory tools are stuck in a technology time warp and cannot leverage the power of corpus linguistics (see article The Future is Corpus Linguistics). A new generation of translation productivity tools will emerge that allow us to leverage any length of strings of text from very large corpora of translations.

These new tools will in many respects be using features and components that emerged from statistical MT technology, except for the fact that they leave the professional translator in full control of the processes. They will unleash the translational power hidden inside very large corpora of text. They will allow us to do semantic searches and clustering, synonym identification, automatic cleaning and correction of language data, sentiment analyses and predictive translations.

In anticipation of this next generation translation technology, many translators and companies have already started consolidating their translation memory data into large, searchable repositories. Some (more than you think) are even harvesting these language data from the Internet, meaning that they have computers crawling translated web sites, aligning the sentences from these web sites, and reconstructing translation memory files.
Call them pirates if you like. But as we have seen in other industries, they are the drivers of innovation. We at TAUS truly believe that it is this kind of innovation that is needed to unleash the power of the translation industry and enable it to prosper.

The TAUS Data Association was established in 2008 as a legal, not-for-profit member-driven organization aimed at hosting and sharing translation memories for all stakeholders in the global translation industry. The publicly accessible and searchable database already contains four billion words of high-quality translation data in 350-plus language pairs.

7. Translation becomes a business of choices
The future of translation either looks bright or gloomy: it depends on whether you want to change, reinvent yourself and adapt. Admittedly, this is not an easy choice. Nor is there a lot of time to consider all the options, but at least translators now have the luxury of choosing. In the past, you became a translator and you were in it for life. Unless of course you became a literary translator, in which case none of the above applies.

Today, you can choose to be a ‘boutique’ translator, specializing in a domain and providing hyper-localization or transcreation services. In this case, you will drift away from the original concept of a translator once you start specializing in your domain. You may be asked to create local content instead of translating text written for a different culture.

You may be asked to do brand checking for new product names. Your job title may change to ‘language consultant’ or ‘communications adviser’. If what you like is linguistics and computers, you may choose to become a specialist in training domain- and customer-specific MT engines, or in translation optimization, or in new functions such as language data cleaning, data selection on the basis of semantic search, search engine optimization, or sentiment and cultural analysis using customer feedback data.

The availability of language data in so many languages will open a much larger range of choices for specialization and innovation. And yes, you can also opt for post-editing machine translation output. Not so much fun if it is not your first choice, but in many ways this option is similar to the first wave of automation our profession experienced in the 1980s with the arrival of translation memory tools.

The good news now, is that the MT engines will soon learn from the corrections made by post-editors, so you will not have to make the same corrections again and again. And translators (or whatever their new title might be) will become much less solitary and grow closer to their colleagues and end customers.

Collaborative networks will bring language workers together. And buyers of translation and language-related services will eliminate one or two handovers in the supply chain and be able to connect directly with you.

Translation may, in many ways, become a commodity and a utility but that does not spell the end of the profession. On the contrary, it will stimulate the need for differentiation, specialization and value added services. It is up to the world’s translators to rise to the challenge, and open up to these changes, and reinvent their future.

Fuente: Taus