Time To Say Goodbye To Translation Memory

by Andrej Zito
on November 19, 2019

Translation memory has been with us for almost 30 years. It saves time, money and improves consistency. However, the quality of adaptive neural machine translation is getting better. Soon, the NMT will offer better output than matches below 95%. Is this the end of Translation memory as we know it?

This episode was recorded on a live stream. I started my Twitch channel with the goal to show Localization Live. 100% transparency, 0% bullshit. Watch me LIVE at http://bit.ly/AndrejZitoLIVE

This is episode #19 of my speaking practice, also known as the Localization Podcast 🙂 #localization and #translation news across social media delivered to you by the power of my voice. #TM

Andrej Zito

And this first article, and the last article, actually, is from Forbes, but it’s not from Forbes. But I see that this article was written by Gabriel fairman. He’s the founder and CEO of bureau works. And I think we already had a couple of articles from this company. They do provide a pretty good quality content. So they are not new to the podcast. And this is a very interesting one. I think it was the very first article that I had on my shortlist. And it’s about the death of translation memory in localization, three key takeaways and key advice. So as you know, hopefully, all of you translation memories, kind of like the standard thing that we use to leverage a previous translation to speed up and save money on translations. So this is definitely an interesting title that caught my attention. And let’s see what they talked about.

Andrej Zito

So here’s a scenario he starts by saying kind of like the history of translation memories of translation memory, has been a cornerstone of the tech framework of the translation industry, ever since its widespread adoption in the 1990s. At that time, working with translation memory was revolutionary sentences now only needed to be translated once, and they will be stored forever. New Content could leverage past similar content, savings were colossal cause distances soon became a feasible concept, even when working with numerous translators on a given project. Yeah, so that’s definitely one of the one of the advantages of translation memory, especially if you work with multiple teams that kind of have like, a real time access to the always updated translation memories so they can see the translations of other translators that are working on the same project.

Andrej Zito

So the article continues and dough, we may look back on it as simple and rudimentary. This was a game changer for nearly 30 years now. translation memory has been driving the core concepts of the translation industry. It determines payables and receivables and governs deadlines and productivity. It feeds translators with options, so they can either confirm or edit and confirm Reddit and translate from scratch. So because there’s a because recently, I came across people who listened to the podcast and the level of expertise, I think varies. So there are people who are just starting out what this means is like how translation memory determines payables and receivables is basically the source text that you get to translate is analyzed against the translation memory. And of course, if there are strings in the source text that are already translated in the translation memory, and the match is 100%, it means you have to spend a very little time on translating the string, again, because it’s already translated and automatically pulled from the translation memory.

Andrej Zito

And many companies even don’t pay for reviewing 100% Translation matches. On the other hand, if you have something that’s a no match, which means it has no previous translation in the translation memory, you basically have to spend the full effort of transcending the string from scratch. And then you have categories like 50 to 74%, matches 75 to I don’t know, I think it’s 85 or 90%, it depends, varies from client to client. So of course, for DOS, you will get paid less because most of the translation that already exists in the translation memory kinda is very similar to the new string that you have to translate for your project. So of course, the expectation is that you will not be doing it from scratch and therefore you should spend less time and therefore the cost for the translation should be less when it comes to deadlines and productivity.

Andrej Zito

So recently, hitches discovered that maybe this concept is not universally known in this industry. But when I started, for those of you who have never listened any of my past stories of my localization journey, I started working in Moravia. A long, long time ago, I think it’s no, it’s like 15 years ago, when I was 19. And in more Avia we at that time, so 15 years ago, we were using the concept of what they called localization units. localization units is basically kind of like formula, which gives the translator an indication of how many weighted words there are to translate. So and this is related to the translation, memory matches that I was mentioning just before. So for example, if you have, let’s say you have 102 words to translate, these are all no matches, it means that you basically have to translate 100 new words.

Andrej Zito

But if these 100 new words are all, let’s say 100% matches, and you just have to review them. So there’s this kind of like effort or like a penalty for these 100% matches. And usually, it’s 10%. So it means that translate 100 100% matched words is the same thing as translating, translating, and new words. So the penalty kind of like cuts the 100 words that are 100% matches into 10, kind of like new words. So you get paid accordingly. And of course, you can calculate how much time you will need. And that brings us back to productivity, which usually typically is around 2000 words per day, for translator. So that means, if you get a source text, which has no translation memory, which means there are no translation matches, it means that, and let’s imagine that all these strings are unique, and there are no repetitions, because that’s another important category.

Andrej Zito

So let’s say that you have these 2000, you get a project which has 2000 words, you can technically say that if you can work on it for the whole day, you should be able to deliver it by the end of the day. And the same would go to let’s say you have 20,000 100% matches to review. So because previously, we established that these are kind of like they apply the 10% penalty. So 20,000 100% matches would be the same thing as translating 2000 new words. So because of that, you know that you can also do it within one day, because that’s your productivity. So yeah, so this is the concept of localization units, which is basically which basically means kind of like calculating like one weighted score of the whole word count, and the scope of the project and the translation job you got, as if it was like translating new words.

Andrej Zito

So for example, if you have 4000 localization units, you know that you can probably do it in two days. So I’m going into a lot of details here. But hopefully, this will be helpful to someone, because like I just said, I just recently discovered that some people, even to this day, don’t know, or are not familiar with the concept of localization units, or weighted worth or however you want to call it. Okay, let’s continue. So they’re mentioning that there’s rise of trained adaptive neural machine translation, which is abbreviated as a T nmt. And these engines have put a crack in the translation memory model it. So this is the first time ever that I see at nmt. We’re pretty familiar with empty, right, that’s the machine translation. A lot about neural machine translation. On some of the episodes of this podcast. This one is trained, adaptive, and empty. So that’s something new for me.

Andrej Zito

So they kind of try to explain it right here. So at nmt engines focus on a subset of this course. So Such as electric or engineering or model of b2c marketing or legal finance, insurance, translation memory, databases, train them, and the 80 and NPS and adapt to how the translators are reacting to the feed in real time. They bring the sophistication, sophistication of smarter machine translation, neural algorithms and the breadth of knowledge generated by Big Data specific translation memories and human interaction all together. So, how do I understand this, I’m a little bit confused by this. So 80, and MPs can adapt to how the translators are reacting to the feed in real time. They bring the sophistication of smarter machine translation in your algorithms. So that’s the nmt. And the breadth of knowledge generated by Big Data specific translation memories and human interaction all together.

Andrej Zito

So I assume this is kind of like advanced concepts, which I don’t know if it like suggests or gifts, translations back to the translators, as a mix of nmt tm matches, and this is also interesting, the knowledge generated by big data. So I don’t know if it’s like how the big data is flowing into this at nmt engines probably is like industry specific, like very industry focused. Like you can probably grab the data from some sources, like from websites, or I have no idea. I’m not an expert on big data. But it sounds sense definitely interesting. Because technically, if you think about it, one source of making better translations, on top of tm is all the text that’s already out there on the internet. Right? So I guess that’s what they mean by big data. Okay, let’s continue you the first step, our transitioning was to allow machine translation and translation memories to operate side by side.

Andrej Zito

This is what I think most of us know, these days. Knowledge Management governance dictated that machine translation would never override the translation memory feed if a sentence had 50%, or more similarity with a sentence or in translation memory. So this is the the match the TM match that I was talking about before, it will pull from that as opposed to machine translation. Yeah, and this is something that I’m quite familiar with. And it basically means that if you already have certain leverage or certain quality, so again, talking about the the matches from team. So in this case, we’re talking about if there’s like at least a 50% match or better, and then the translation would come from the translation memory. And if it’s below that, it would be machine translated, and then the translator would have to post it.

Andrej Zito

Okay. So 50%, right. But as a T and n T has gotten better and better, it’s now no longer or certainty that a partial transition memory match will be better than an 80 and empty feed. According to data we collected from over 30,000 of our translation projects in 2019. We’re already at a point where 50% to 74 4% translation memory match will require more editing time for a translator to fix than a well oiled 80 and empty engine feed for that same string. And based on our data, we are moving quickly into a world where 80 and empty feeds will be better than 95% matches originating from the translation memory. So just to recap, if you didn’t get the long sentence with my couple of pauses, because I was reading.

Andrej Zito

So or earlier, we established that the threshold was kind of like 50%. So everything that was 50% or better from coming from translation memory would not be touched by the machine translation, but now they’re actually saying that even they’re moving this trash To 75%. So 75% match or better is pulled from the translation memory 74% or less is machine translated. So that means basically the scope of the machine translation is increased, they’re now also translating 50 to 74% matches, and based on their data and where they’re going is that they feel that at some point, MT will do a better job, even when it comes to matches below 95%. So in that case, only 95% 200% matches will be coming from the translation memory. Okay. So that was kind of like the story of where they are now. And here’s kind of like a summary of the best practices and conclusions.

Andrej Zito

It’s three points. Yeah, three points. Number one, assimilate transition memory is that we just haven’t buried yet. I liked it. tools that are already available can offer translators better feeds than translation memory what, however, the industry lacks? Well, of course, everybody in the industry is always behind. It still doesn’t know how to merge translation memories and 80 nmt into single peridinium. As an example, company x may quote using translation memory, but pay vendors using at nmt leveraging. most traditional Computer Aided translation systems out there still have translation memory as their backbone and can pluck into 18 empty engines parallel. Yep, so those are the Schilling. So company x is a shitty one. And I probably have my own experience working for similar company x that will be charging the clients higher rates and then trying to scheme every cent is that scheme, or save all the money on the freelancer side and use a different rates of course level ones from my experience.

Andrej Zito

So this is not my experience, but this is going back to the article article. From my experience. What we need is to entirely do away with transition memory and shift into a more enroll atmp model. This model should be able to predict the necessary effort to complete any given string. And both payables and receivables should rely on this post edit f worth or distance, as opposed to simple transition memory matches. Make sense to me. Look at the data trends, not at preconceptions based on past possibilities. Okay, number two, discuss adjusting is hard. Talking about it openly makes it easier. When translation memory began as a regular business practice. Most translators argued that it took away from their writing creativity, provided them with horrible feeds that to go over to fix them right from scratch, perpetuated human stupidity by walking 100% matches and so on.

Andrej Zito

A lot of the pushback relied on preconception, and a lot of it was legitimate. New Tech is typically not all ironed out, particularly when it comes to the ramifications for the actual users. The Tech is designed to solve the high level business challenge, and leaves us mere mortals who are operating with it daily to work out the details. Naturally, with 18 T, it’s not likely to be an indifferent between the concept and execution lies a reality. We need an active dialogue between all involved community members, including business owners, client stakeholders and translators. We need to see things eye to eye okay. Number three, decide. It’s not just about dollars and productivity. It’s also about what language managers care about the ROI EBIT da and other alphabet soup letters that point toward increased earnings.

Andrej Zito

Yep, that’s true. In this process, we overlook language, the experience of translators and ultimate ramifications of how innovation shapes knowledge management. Around 10 years ago, people began to play more actively with the concept of machine translation, post editing. That meant more money and productivity. But they didn’t think through what this meant in terms of the experience. Imagine quickly going through a document that is all over the place, something beautiful written, and it terms with unacceptable mistakes. Your pay is a fraction of your typical fee. And expectations are that you will remove any Miss translations. Yep, that’s the unfortunate reality for many translators dealing with empty, high pressure, lack of clarity in expectations, and just lack of linguistic thought leadership from ghosts are really doing the work result in New Tech being driven down the food chain without the necessary degree of checks and balances.

Andrej Zito

These knee jerk reaction toward profit creates a terrible experience for those that are being crunched by the change. If we begin by assimilating and then have an open conversation about this in order to create common ground before deciding on rules that make sense for everyone, or at least take multiple points of view into consideration, we are better positioned to reap or the benefits from this change as opposed to fear and frustration. Okay. And that’s basically it. So number two, discuss about this new change. Number three, decide. Okay, I don’t think I have anything more to add. So where are we we are roughly few minutes before we hit 30 minutes. And because I wanted to keep this short, I’m going to stop the recording right here. So thank you very much for listening. This was a short half an hour episode of opposition podcast number 19.

Andrej Zito

If by any chance, you want to give me some feedback, if you prefer a shorter one, or if you prefer the longer ones, and where we typically covered two or three or more articles, then please let me know. And again, I’ll try to focus most of my time on building the website so I can get the live localization project going on and off the ground. So again, thank you for listening. Also, thank you for watching, if you’re watching this on either the stream or if you’re watching this on YouTube, so thank you, thank you. Thank you and I’ll talk to you next week. Bye.

We’re always creating new localization content

Make sure you don’t miss anything. Join 7470 other professionals on our mailing list and be the first to get our upcoming newsletter.

Time To Say Goodbye To Translation Memory

We’re always creating new localization content

If you enjoyed that, you’ll love these…

Translation Memory (TM) 🎮

Computer-Assisted Translation (CAT) 🎮

Multi-Language Vendor (MLV) 🎮

Localization courses

Resources

Connect with us