Google’s Massive NMT, Metal Gear Solid’s English Translation

•      Was this content helpful?

Was this content helpful?

The Localization Podcast with my own little self is back with episode #3. Rozetta from Japan triples its revenue from MT. Netflix and their approach to subtitling. Translating on mobile using Memsource. Google’s massive neural machine translation. Localization on social media – Facebook, Reddit, LinkedIn and Twitter. Polygon’s article on translating Metal Gear Solid into English.

Timestamps:
2:12​ – Rozetta triples MT revenue
5:53​ – How does Netflix do its subtitling?
13:40​ – Memsource mobile to translate on the go
17:51​ – Google’s massive multilingual NMT
24:21​ – Localization on social media
36:19​ – Translating Metal Gear Solid into English


Andrej Zito 

Ladies and gentlemen, welcome to the localization podcast, episode number three. It is Saturday for me here in Vancouver. It’s 8:30pm. And I did a little bit more preparation for this episode. Once again, before I started, before I actually pressed the record button, I was like, you know, moving around or thinking, like, Am I prepared to do this? and stuff like that. So let’s just get right into it. For those of you who are listening this for the first time, or in case you missed it before, one of the main reasons why I decided to start this podcast is so that I can practice speaking, speaking, like without preparation, just try to formulate my thoughts in the best possible way. And so here it is. So for this episode, I think this one will be a lot more better and valuable than the than the episode number two, where I didn’t add a lot of extra comments.

Andrej Zito 

On top of the articles that I pretty much just read. So this time, I changed the approach a little bit, I tried to go through the articles and kind of like take notes for each one so that I can kind of like rephrase it. So that it doesn’t totally look like I’m just like stealing the content from from Slater. And the second thing is, where is there a second thing? I’m actually not sure. Yet this time, I also have some common that I spotted on social media. So that’s that’s a good thing. And yeah, the second thing that I actually wanted to say before that was that most of the the articles that were presented on Slater for the past week, I actually have something to say about them.

Andrej Zito 

So let’s get right into it. The very first article will be a short one, because it’s once again about financial results, or m&a, you know, the business stuff that I usually don’t have much to, to say to say anything about it. So just briefly, the first one is about a Japanese company called Rosetta, and they published their q1 earnings. And their revenue is up by 50% compared to the previous year, and the company is heavily invested into machine translation and their machine translation, business tripled. So that’s three times more, and then previous year, and now it makes up to half of their revenue. And according to the article, they have significant improvements in the machine translation quality. And that’s why they’re shifting their strategy towards empty only world.

Andrej Zito 

And also, their article mentioned that they had five business lines before. And now they just narrowed it down to three business lines, which is machine translation, human translation, and crowdsourcing. So machine translation already, is, you know, all pretty much like all the episodes that I did before. So that’s actually just two, they talk about machine translation. So I said it in the last episode that this is definitely like something that’s that’s, that’s that’s coming at us. And and I’m still kind of like wondering like, how practical is the application of all these machine translation services? Like if you’re an LSP? Do you just go to these companies and you just purchase their service? Like is it a subscription based service? Or do you just buy their model and then you training on your own with your data? Or how does it work? I have absolutely no idea what like what is the practical aspect of machine translation.

Andrej Zito 

And it just happens that one person who is founder of an empty company, he contacted me on Reddit so I’m in touch with him and he might actually be the very first guest for this podcast, so I would be very much happy about it if I could share, like, his experience and his knowledge with the rest of you because like, I assume it is my room that most of you who are actually listening this podcast have pretty much the same technical knowledge about machine translation as I do. And that is that is like, that is a very limited knowledge. So it would be really interesting for, for us to, to get more details on how this process actually works, and how and what is like actually the business model for these companies that are working on these empty engines or empty models. So that was the first article, second article is about the company.

Andrej Zito 

And that allows us to waste a lot of time. And that being productive, and that is Netflix. And this article was about the subtitling process. So before I get into the details, I want to mention that I was at some point, I think it was before I left Thailand and I went to Poland, I was interviewed for Netflix for a position in Singapore, I think it was in Singapore. But unfortunately, I did not get the position. Not sure why I even mentioned this. Anyway. So the article was first started by saying that in two years, Netflix edit a lot more languages to their service. And now their content is available in 27 languages worldwide. And in the same period, that’s two years, their subscriber count grew by 50%. Now this can be a little bit misleading, because we don’t know if all of that growth is because of the availability of localized content.

Andrej Zito 

So it would be nice if Netflix actually published some or maybe they did, I just didn’t look into it because the article insulator was my only source of information. So it would be interesting to see like how many of these new subscribers actually consumed the content in non English format. So that’s about numbers. And I think this was, this article was based on a conference that Netflix had in California. And it was like a panel discussion with some of the key members of the localization team of Netflix. And then the article mentioned, that the subtitling process begins with a script, which serves as a guide for the translators. And Netflix is using their own tool called the originator for the whole subtitling process. And then there was this dimension about let me actually quote this, I have to find it here. About how they deal with the priorities. Where is it?

Andrej Zito 

Yeah, okay, here it is. To describe how the team is able to track priority projects within the origin ater. stuff that is flowing through will drop to the bottom, but we can easily see unassigned critical or overdue tasks. So that’s all the set about their process, just few words, but from the from from the description. To me, it sounds like their process is very kind of like streamlined and they just pay attention. If something shows that it needs like an action from someone from the team. Otherwise, it just flows continuously. So I assumed that they just automatically published the English script and it goes automatically to their freelancers. And if there are any issues, it gets flagged and somebody takes an action.

Andrej Zito 

And it’s definitely something that I like because like as long as you as long as everything is like within the kind of like ideal or The right flow or workflow, then you as a project manager don’t actually need to interfere with anything. Then there was an interesting part about licensing. And this was about the content that is not owned by netflix and how it works. And from the answer from one of the members of the localization team, they mentioned that the licensing also clearly defines, in which countries, they can show the content, whether they have right to the subtitles, or no. And the important thing was regarding subtitles that they actually have to use the ones from the, the content owner. So for example, if there are streaming content created by Disney, they also have to use their subtitles.

Andrej Zito 

The workaround that they use is that they can kind of like propose changes to the subtitles. So that kind of like fits the sort of fits the the Netflix kind of like style. But it has to be approved by the content owner, and then kind of like re posted back to Netflix before they can use it. The final the final part of the article was related to, to tech. And it said that they’re using speech to text to generate the English template or the English script. And they’re investigating the uses usage of machine translation for this purpose, but they said that, so far, they haven’t found a solution that would fit that would be better than their current solution. And this is something that kind of like, triggered me, because I mean, that triggered me like pissed me off, but kind of like, reminded me of what I taught in the first episode that was like the what will be the role of translators, once machine translation gets on par with the human translation quality.

Andrej Zito 

And this is actually one of the areas that may be the translators would still be needed. Because translating something like movie or TV series, it requires like a lot of creative translation and creative output, which I’m not sure if the empty, we’ll get to, I mean, let’s say eventually, it will get to, but it will be kind of like the second or third stage. It’s not like translating documentation, like click here. So like this, and blah, blah, blah. So yeah, that was the second article about Netflix subtitling, and let’s get into the third article, which was, which was actually a sponsored car sponsored content from memsource. And it was about their mobile application that they’re releasing. And they said that cat and TMS tools on the market have not been optimized for mobile. And I’m actually not sure if this is true.

Andrej Zito 

But it is true that I have actually never heard that any of our freelancers or translators that have worked before would actually be translating on their mobile phone. So I’m not sure like what as DL, what kind of solutions as dl has for translating on mobile. But this is kind of similar to as the language Connect, which was, I think, part of the episode one. And they were saying like, it’s like the first all in one solution, you know, for project management and for machine translation for for translation management system, and it’s a cat tool, all in one. And they said like there has never been only one solution before. So I’m wondering if this is again, a similar case, that it’s either just like marketing, so it’s like not true or The second question would be, why is a mobile solution coming?

Andrej Zito 

Only now? It’s 2019. And it’s July 2019. So the mobile and the dominance of mobile has been here for quite some time. And especially with localization working, I mean, not working, moving, moving towards more agile approach where you get like a lot of small requests here and there. Translating on mobile, kind of definitely fits that fits that shift. So I’m actually wondering like what the translators would think about this solution, and whether it’s something that’s feasible for them, because, for example, if I think about what I do, I still prefer to have like a regular computer, I can see myself doing like some work on on the go or on the wall while like when I’m in the commute, or whenever.

Andrej Zito 

But the article also focus that the article mentioned that the main, the main, kind of like a focus of the mobile translation would be shorter, or shorter translation requests. And you can also do post editing and review in the app. So I guess like for some smaller pieces, let’s say let’s say that you have like a service, which helps translate. I don’t know posts, like on social media, or like customer inquiries. So that’s like a small content. And you could just run it through machine translation and send it out to your crowd of translators. And using their mobile app, wherever they are, they just approve it, or they like suggested change. And this is like a very simple thing. So this that’s why I was like earlier, wondering why such a solution wasn’t available on the market before. And actually, like even global nice kind of doing something in that space when it comes to crowdsourcing.

Andrej Zito 

There was the third article, and let me get some water. article number four. And this is the last one that I picked from slater.com is about Google’s and how is this Google’s massive multilingual neural machine translation? So when I was first reading this because it’s like, charged with Mmm, you know, massive multilingual, so I thought it was like animal. So this is mmm nmt. So we have even more acronyms to use. And what does a multilingual nmt mean? He was explaining the articles. So what it means is that you have your training, just one model for all languages versus having one model trained just for one language pair. And the benefit, according to the article is that it reduces the training, time and cost and it simplifies deployment in production systems.

Andrej Zito 

Okay. So what what is Google’s goal with this initiative? I think the article said that they were working on it for five years. And this whole article was based on white paper or something like that, that they recently published. And so back to their goal, their goal was to build a universal nmt capable of translating between any language pair. Now, the reason why it’s called massive is because this model is supporting 103 languages. And it was trained on over 25 billion examples. And I’m not sure if this like one example equals basically one string. And I think it I think it means like one example equals one sentence, it’s like an article. Now, the next thing that I learned from your articles, because you’re basically mixing data from different languages, you kind of like have a conflict between and they call it all Low resources versus high resources languages.

Andrej Zito 

So high resource language would be the one for which a lot of data is available compared to low resource language. And that is the one that is that doesn’t have that many data. And of course, this has certain impact. And the impact is that the high resource languages can actually perform slightly worse, because they’re affected by the low resource languages. Well, on the other hand, low resource languages are going to be better because they benefit from the training that the model gets from the high resource languages. That’s how I understood this. And so there was one kind of like a suggestion in this article. Like how to use this, and it’s called transfer learning. And that is to use the data from the big pairs.

Andrej Zito 

So that means the high resource languages to improve the quality for small pair languages. While on the other hand, for the high resource learning languages, you would have one model for each language so that the quality doesn’t get worse. But in any case, now, I’m kind of stuck. I have no idea what I wanted to say. So yeah, this article was kind of like, I learned a little bit from this article. And like I mentioned in the beginning, I’m really hoping to get someone from the MT community on this podcast, so that we can all learn a little bit more about it. Because right now, to me, it’s kind of like a black box. You know, like everybody’s talking about machine translation, but like, what is the practical aspect of it? Like how you can actually start using it? If you’re, like, I don’t know, like an enterprise client? Or if you’re just like a small LSP, like a very local translation agency, would you benefit from it?

Andrej Zito 

Like, what are the costs? Like, Do you need something? So these are the questions that I have that I have no answers to. But I hope that I will be able to provide answers from someone who understands the topic. Because like I mentioned earlier, it looks like machine translation is, like, coming big. And it’s going to hit us at some point. So these were the four articles from Slater that I picked, there were a couple of more, I think one of them was about. One of them was about the VCs, again, putting money into machine translation, startups and stuff like that. So it means that there will be more and more solutions for empty. So I’m not sure how fragmented the market will be and what it means because like, again, I have no idea like how this works from a practical point of view.

Andrej Zito 

And the other article, I think, was kind of like Slater’s research on translation companies or lsps, in Switzerland. And I think it’s like a report that you can maybe buy from them, just to see how the market is in Switzerland. So not that important for me. So that covers Slater. I want to now quickly talk about social media, because this is something that I promised before, it’s part of the description of this podcast, but I have never actually delivered on this promise. So this time, I did a little bit more research. I’m going to share with Facebook because that’s an easy one. And I think I mentioned this in the previous episode, I joined this group called I think it’s called localization professionals or something like that. And it’s run by people. From loggers, they’re, they’re the ones who created it, and they’re the ones who administer the group.

Andrej Zito 

And I recently found out that they also are the owners of the biggest largest group on LinkedIn called causation professional. But I’ll get to that later because our just admitted to the group this week. So the Facebook group going back to them, there is absolutely nothing. The content is pretty much usually from them. So promoting la Cruz and the other posts they found in the group, it’s not very active, first of all. And second of all, the other posts were again, kind of like a self promotion. I try to look for the localization hashtag on Facebook. And it shows mostly my own posts. So I’m not sure if it’s that I, for some reason, can’t see the other public posts that were made with this hashtag, or whether it’s really that nobody’s actually using it on Facebook, I think it could be the second option.

Andrej Zito 

Because in my opinion, and from what I’ve seen, like Facebook is like not the right person to not person, not the right place and the right platform to talk about localization. Second platform I want to cover very quickly is Reddit. And on Reddit, I also mentioned this earlier, there is one localization group, which is called simply localization. And it’s also full of links to articles on lsps websites, or a day or white paper, and there’s like very little, I’m losing my voice, sorry. And there’s a very little, like, actual meaningful discussion going on. And there was only one new post. And it was kind of like it was it was a question somebody was asking about our QA. So yeah, that was a funny thing,

Andrej Zito 

Because there was a guy who was saying, um, well, I’m not sure if it’s a guy or girl. It was a person who was saying that they manage localization in the startup, or software company or like an app company. And they were asking about how to do our QA. So when I was ready to kind of like respond to the article, there was already one reply. And it the person basically asked like, what do they mean by our QA? What do they mean, linguistic QA, which means QA before the translations get into the product, or what did they mean? Kind of like a functional QA, on the localized product? After the translations were integrated into the into the app. And because the reply from the first person seemed very reasonable, I didn’t reply any further, because the the op, reply to this comment saying, just thank you, you know, he didn’t specify like what actually he was asking about.

Andrej Zito 

So to me, it seems kind of like a waste of time to provide any more feedback on that. Oh, man, he’s talking. I’m talking for like 30 minutes, and it’s really drying out my throat. So that was Reddit. Oh, by the way, for Reddit. I had it on my backlog for some time. And I actually started my own subreddit, which, which is called localization, localization, no BS. So I’m planning to post some content of my own there without linking it to, to my own pages. What’s wrong with my voice without linking it to my own pages, and hopefully, it will be kind of like a cleaner subreddit where, where every post would be valuable and kind of like original content, so that it doesn’t look like that spam that’s on the on the main localization subreddit. So that’s that about Reddit. Let’s move to link in because I didn’t see much happening there either.

Andrej Zito 

So I I checked the localization hash tag and I found nothing Interesting that I checked the localization professional group and I also didn’t find anything useful there. I think I replied to one comment that I found in the group earlier this week, as soon as I got approved into the group, and that person was asking about how people manage and track their localization projects, where do they use XML Trello, or any localization specific tool. I also mentioned that I really liked using Excel before when we have this long term projects that lasted like six plus months. And it was definitely way easier to track projects that way versus my using Microsoft Project, using Microsoft Project. And Trello. Of course, I love Trello Trello. I use it personally for my own stuff, whether it’s like YouTube, or podcasts, or whatever I do outside of work.

Andrej Zito 

But of course, it’s not that robust solution for actually running like a whole localization department through Trello. So that was my reply. And then there was another guy who also reply to the same comment, I think they have like some solution, which they say is like, again, this one integrated platform. And I think they’re actually based in Toronto. So I reply to them that I’m kind of like, interested, and would like to know more. So that person said, Yeah, he’ll be happy to speak with me. And then I reply to him to DM me. But I told him that I’m not like prospect that I’m just like, curious about their solution, especially compared to as deal language cloud. And so we’ll see that person replies. And if he does, I’m going to, again, try to put him on this podcast, so that we can all learn something new.

Andrej Zito 

Or maybe if it’s a demo, maybe I’ll just do like, video. And I’ll ask questions, and we’ll see how it goes. So I’m actually really excited like about the opportunity of having guests on this podcast. And I just need to get the first guest, I think. And then based on the result, maybe I will completely shift the focus on just interviewing people, although I do like doing this, like solo podcasting. Except Not now. Because my throat gets really dry and itchy. I’m still kind of not fully recovered from the call that I had two weeks ago. So anyway. And the final platform that I check is surprisingly Twitter. And I got two very interesting posts that I saw her. So when I was just I was just, you know, on Twitter, I was just looking at the posts for localization hashtag.

Andrej Zito 

And I saw this posts, kind of frequently from this company, which is called to level up translation, and they do localization for games. Which is something that I always wanted to try because, you know, I used to like games. I still like games. I love games. I was a gamer since my early ages. And even though I was working in localization for what is it 1415 years now, I have never ever worked on any game localization. And so I thought that you know, like combining my passion for games and my expertise in gaming, I would be a pretty good fit, but when I actually try to apply for jobs in games, I was rejected many times because I had like zero experience in games localization and I don’t think that like it’s such a totally different world than localizing I don’t know software or marketing stuff or any other usual things.

Andrej Zito 

So but But enough about me, so I like their posts. Then they have on Twitter. They’re kind of like thumbnails are like very consistent, though I noticed that their images are not optimized for social media, so many times the, the texts and which is the main thing that should capture your attention is truncated. So I dm them, and let’s see if they reply. And if they reply, maybe I will again, no, not maybe I will definitely ask them like if they would be interested in joining my very little small podcast with almost no audience. If you’re listening to this, and you’re my audience, congratulations to you. It must be a very painful experience. But thank you for listening. So that was the first thing and the second thing, and this is a big thing. So while I was browsing through all the posts, and scrolling down, I noticed this image in the shared articles which were from Metal Gear Solid, and I saw it a few times, then I saw it again and again.

Andrej Zito 

So this article was shared a lot on Twitter. And so then I finally clicked it, then it’s an article on polygon. And it’s about a person who worked on the translation for Metal Gear Solid. For those of you that don’t know anything about games, there’s this one Japanese guy called Hideo Kojima. And he’s he has this like legendary status of creating games that have like a very exciting narrative and story. And the storytelling is like very, very acclaimed, is that the right word? acclaimed. And so Metal Gear Solid is actually a game that I played on PC, though, it’s not the main platform that Metal Gear series was created for. Those are Sony platforms, sorry. So I played this game. And it was definitely one big experience, especially from the story point of view, like all the cutscenes, all the dialogues, everything is great.

Andrej Zito 

And so this is a guy who had to translate the game from Japanese to English. And I’m going to try to quickly because the article is like very long, and I didn’t take notes, which is shame on me when I was reading the article, but there are some very nice, interesting parts that I want to mention about the role of a translator in such in such a task environment. So the person who wrote this article, and who was translating this game, first mentioned that he only worked on some smaller games before. And then I think he had six months to translate the whole game. And he had to basically understand the whole world that he they occurred, you might create it in Japanese, because the whole game is has a lot to do about now. Where is that thing? Yeah, so here it is. There was all this military tech throughout the game, including specific gun names and details about how US nuclear weapons are looked down.

Andrej Zito 

Background on the Cold War, Alice, can you native tribe, Special Ops, psyops, you name it? And these were things that I knew nothing about when I started work on the translation. Yeah, and this is another interesting thing. People may have a hard time really appreciating the fact that at the time, the internet was not thinking no now, there was no YouTube, no Wikipedia, no Reddit, and there were no other translations of similar work to reference. The word localization barely existed in the business in 1997. I was all on my own and no one was looking over my shoulder. I ordered every book I could by an ex Navy SEAL named Richard Martin Co. wrote The Autobiography called book rocque warrior, along with a collection of novels, I had the sense that that’s what kijima was going for a gritty feeling of realism with touches of James Monson gadgets and inventiveness.

Andrej Zito 

So then the article mentions about all the research that the translator had to do, which is something that you don’t see a lot happening these days. I’m not sure if for gains, it’s still happening. But the content that I’m usually exposed to, like nobody would give you like one or two weeks to do research on a certain topic. So this is the this is the most interesting thing that I wanted to bring up in this. In this segment, and that is the order set. This is, let me just call this, this is a good time to talk for a minute about translation. And the idea of originality. Many people misunderstand this topic. And it’s not surprising. There’s not much solid information about how this process works. And there definitely wasn’t during the time period, I was working on Metal Gear Solid.

Andrej Zito 

Translation is not a science, it is an art. One of my steak liberties with the text to capture the essence of the words, in an attempt to recreate to recreate the feeling of the original for a very different audience with a very different cultural background. The rest of the article is kind of like a lot of like self promo. The auto still says that he thinks he did a great job. And based on what he’s writing, I think he actually did a great job because talking about like, I don’t know, Special Ops, soldiers and nuclear weapons in Japanese, probably. And the way that kijima probably creates a narrative in his own language can just be you know, translated one to one into English. So that’s what the author was saying that he had to do like a lot of research about the topics, you know, like nuclear weapons, Special Ops, soldiers, and stuff like that, and try to use their jargon, and how it would be used by let’s say, American people.

Andrej Zito 

So this amount of research is something that I have never experienced before. And it’s really fascinating for me. And the article ends with the fact that at some point, Hideo Kojima, the great creator, actually found out because he wasn’t bilingual, he found out that this translator, kind of like altered the script here and there, you know, to basically adjust it and make it make it better for the English audience. And he and kajima actually didn’t like it. So that’s why he stopped working with this guy, which is kind of a shame. And there was like one example where there was one example of like, future translations of the game. That really sounded awkward in English. And they basically had to be there would be better if if the translation was kind of like, more creative, rather than trying to stick to to Japanese.

Andrej Zito 

So yeah, I really liked the part where he says that translation is not a science, it’s an art. And this again, brings me to the topic of machine translation versus translators. And this is another area where translators could steal, keep their expertise and domain and not be, you know, pushed out by by them by the machine translation. And it’s the same thing as translating scripts for for Netflix. Whether it’s Netflix, or if it’s a game, especially maybe game might be more more more challenging for machine translation to do actually a good job. Especially when the game kind of like creates its own world and speaks like a certain language or style. Hmm, I think I’m getting mixed in my thoughts. I’m trying to think like what else I could add. But I guess I’m just kind of like not saying anything useful.

Andrej Zito 

So I think this is where I’m going to end the podcast, we are closing 15 minute mark. And I’ll have to get to the editing, which will be a long line because I actually did mess up some parts here and there. And there are a lot of silences. And my voice is kind of like losing its power here and there because like my throat starts to get itchy and low. Anyway. So in this episode, we had four articles from Slater, we look at some of the interesting events, or events, more like updates from social media. And I talked about polygons article about Metal Gear Solid translation, which I hopefully if I don’t forget, I will link it at least on my YouTube video, because it’s a definitely interesting article that you should read, especially if you are a translator or you care about translators, at least in a little bit.

Andrej Zito 

Okay, so that’s going to be it for Episode Three, if you got to this point. Thank you very much. I appreciate your time with me. Hopefully, you learned something and it was not suicidal for you to listen to my voice for 50 minutes. Thank you, everyone. And I will talk to you next week. Bye.

We’re always creating new localization content

Make sure you don’t miss anything. Be the first to get our upcoming newsletter by joining the mailing list now. 

If you enjoyed that, you’ll love these…

Why hello there!

Enjoy 10% off

on your first course when you join our mailing list.

* All information collected will be used in accordance with our privacy policy. You may unsubscribe at any time.