How To Create XML Filter In MemoQ

by Andrej Zito
on February 22, 2024

• Was this content helpful?

Was this content helpful?

Is memoQ your main CAT tool? In this tutorial, you’ll learn how to create an XML filter in memoQ with our instructor Carlos.

Carlos García Gómez

Hi, everybody, and welcome to this video series where I’m going to show you the basics and the parsing capabilities of different CAT tools and translation management systems. This will help you get started on the technical side of things. The first tool that we’re going to see is memoQ. So let’s get started. Alright, so we’re going to start with memoQ. And the first thing that we’re going to learn with this tool is how to check the parsers for the filter configurations for these tools.

So we can go to memo queue. And then under Resources resource console, we click on Filter configurations, this is going to be the place where we can check or even create or customize any parsers or any filters. These are only the default ones, because if we click on Create new, or filter, we open this drop down list, we can see a bunch of other filters that we can create from scratch, or even close to minus them. So this is basically the place where we’re going to create new filters, depending on the file format that we work with. The file that we’re going to check right now is an XML file, which is this one. So we’re going to actually create a new parser or a new filter for XML, we can click on Create new again, we can give it a name so and for engineering and then XML. For these filter drop down lists, we’re going to specify XML filter, we don’t actually need to give it a description so we can leave it empty, we can click on OK. And now we’re going to see the end XML filter in this list. And actually on the right hand side here for the filter column, we can see it’s an XML filter. So we can right click on these filter and go to Edit. And this is the place where we’re going to customize or modify the settings for this filter, or this parser. The first tab that we’re going to see is encoding and reference files. This is only the default encoding. So by default, we’re going to always use Unicode UTF, eight, that’s just the standard. And here for this section, reference files and DTD, we can basically add a file, we’re going to specify or XML file, and it’s going to be added here, we’re going to use these later on in the process. Now for the General tab, these are just some settings. And we’re not going to go through all of them. But basically keep in mind that by default, it writes on to the byte order mark to Unicode encoded files on export, we’re going to disable this option, because otherwise, if you translate, for example, from English into German, and you to use source file, which is UTF, eight without bond, when you translate and export that file, the target file is going to add a bomb, okay. And when we deliver the files to the customer, ideally, we would need to deliver the same encoding as the source. Okay, so we’re going to disable this option. If we go to tags and attributes, this is the place where we’re going to work the most. So we have different sections here handle tags, and tag attributes. The first thing that we’re going to do is to click on populate, and the Populate option is going to work depending on the file or files that we have added here. Okay, so we added the parsing dot XML file. So if we go to tags and attributes here, we’re going to see all of the elements for this XML file, we go to notepad plus plus we’re going to see that we have the note two from heading Mehta and body elements. So these are basically the elements that we can see here by default. Now it’s only a matter of checking if some elements are going to be translatable or not. So we can check this in notepad plus plus Alright, so for example, the first one which is two and the second one, which is from here, if you are familiar with embedded content, you will notice that we have some placeholders right. So these placeholders are not going to be translated. But this does not mean that the two or the from elements are going to be untranslatable or non translatable at all. Because for example, in this XML file, we can have a placeholder which occupies the full string for the two and from elements. But in some other file, we could have some strings for translation, okay, so the placeholders or the minute content, so it is going to be protected separately. However, here we have an element which is meta. And here we have some code. And because of the name of this element, which is meta, we will know that for these kinds of XML files, if we have this element, this is not going to be translatable. So something that we can do is to go to Mita here, we can select this element and enable the not translated settings. Right, you will see here on NT, which is known not translated or non translatable, and this is not going to be added for translation anymore. Now, this is just a small XML file. And these are the basic settings that we can play around. So if we go to entities, for example, here, we have some options for entities, and then subtitles, which in this case, it’s an XML, so we’re not going to use any subtitles. But basically, the core of the functionality for the XML filter is going to be found here. If we have any element, for example, like meta, which is not going to be translatable, we can use this option. And even down here we can see a preview. So for example, if we click on meta, we’re going to see the content. And if we go to note, this is the root element. So note here is the main element. This is called the root element in XML. And we’re going to see the full file, right, we’re going to see the same that we can see in notepad plus plus. Now, again, if you are familiar with embedded content, you will notice that we have here some placeholders, for example, a student name, instructor name, or title, a student name down here, we even have a let me actually show it here in Notepad plus, which is bigger. We even have the backslash n, which is the code for a new line that’s not going to be modified during the translation. And here we have another placeholder, all of this embedded content is not going to be translatable or right, both the non translatable or the embedded content, as such is not going to be protected under the filter settings. Okay, this is why we’re not going to modify or add any rules to protect those placeholders for now, we’re going to say that the meta element is not translatable. And we can click on OK. And this is just the filter that we have created for XML. Now, you might be wondering, alright, but how can we protect the embedded content for this XML? And that’s a good question, because it’s the next thing that we’re going to learn with memo Q. That’s called a regex tagger. So what we can do is to create a new regex tagger, which is going to be used for the embedded content of the files. For example, we can click on Create New, and we’re going to call it and REG X, Tiger. And for filter, we’re going to specify reg X Tiger, which is here, so reg X Tiger, we don’t need to add our description again, and we can click on OK. Alright, so we’re going to see because this is ordered alphabetically, so all of them are going to start off with and we will see them all together. This is the XML filter that we created before. And this is the end regex tagger. And on the right hand side, you can see a regex tagger that we’re going to use for the embedded content. So we can right click and click on Edit.

And as you will notice, these are just different settings that before because it’s a regular standard. So what we are going to do, basically, is that we’re going to check the file in no darkness plus, and we’re going to use regular expressions in order to protect all of the embedded content, you will notice here that the pattern is the same for most of them. So we have the opening curly bracket, the closing curly bracket. And then inside we can have some letters, some underscores, or letters, the same here, or course title, the same student named the same. Or we could even have digits, okay, or numbers. So we’re going to protect that with regular expressions. Now, first off, the tag type is going to be empty, because it’s a placeholder is unique placeholder. If we have something like an opening tag and the closing tag, we could use open and close, but this is an empty tag. So here we’re going to add the regular expression. And basically, we’re going to use curly brackets, which are going to be taken as literal characters. And inside, we can use a shorthand character class, which is the backslash, w. And this encapsulates any letter, any digit or any underscore, and we can use our quantifier, which is the plus symbol. And these means one or more times of the preceding character. So in the end, what this is doing is that it’s capturing between curly brackets, it’s capturing any digit, any letter or any underscore one or more time, okay. And we can actually click on required, because we would need these placeholder to be required for the translation. All right, so imagine that we are translating from English into German. We protect these placeholder and We want to force the translator to keep these placeholders in the translation. That’s why it’s going to be required. And we can click on Add directly, and we will see it right here. Okay. So, as you will notice, we have here, the dollar symbol, and then a zero, which is a display text. And that’s because if we copy all of the text and put it here, so for the input text, here, we’re going to see how the regex tagger is going to treat all of these placeholders. Alright, so what we have here, and we don’t have actually the libraries when we copy and paste, but you will notice it here, when we have a student name, for example, it’s going to be modified internally into attack. So it’s going to protect the student name, the instructor name, the course title, a student name, and the zero, you will see also the full West last and then the closing or the greater than character. And that’s because it’s going to be protected as attack internally. Now, what we also need to add is the new line. So as you can see, here, we have the backslash, and then an M, and that’s going to be protected as well, because we don’t want the translators to modify that by mistake. So what we can do is basically to add a new one. So we can say, backslash, n. But the thing is that the backslash with regular expressions is a special character. So we need to escape it. And the way that we escape it is with a backslash, okay, that’s why we need two backslashes. And then the end, we’re going to see required as well with empty. And now we click on add, if we click on change, what we’re going to do is to modify the selected one, so make sure that you click on add, if you are also doing this on your site, click on Add. And you will see now the backslash here and the previous regular expression here. Alright, so what we’re going to also say is that the result is going to protect the backslash, so we have the backslash n, and then it’s going to be protected into a self closing tag. Alright, so this is the only embedded content that we have in this file, we have protected all of the embedded content. And we can click on OK. And you will notice that we now have the NG XML filter, and then the regex tagger for engineering. The next thing that we’re going to do with MOQ is to create a cascading filter. And a cascading filter is basically a filter that is going to use two filters to different filters or even more that we have created previously. This is very important and very useful when we have an XML filter or any other kind of filter, for example, for JSON, if you’re working with JSON. And then we also want to incorporate the regex tagger in order to protect the amount of content. Okay, so we’re going to click on Create new cascading filter. That’s why it’s called cascading. And let’s call it for example, and XML with REG X tag description is going to be empty, although you could use the Description field for any description that we do want to add. And here, you will notice that we have first filter, first filter configuration, second filter and second filter configuration, we’re going to do the following. So for the first filter, we’re going to specify the type of filter that we are going to work with is an XML filter, because the XML filter is the type of filter. And then for first filter configuration, we’re going to see here a bunch of filters. Most of them are the default ones with MOQ. But we have the NS XML, which is the one that we created before. And that’s the customized end XML filter that we created previously. And now for the second filter in this cascading filter, we’re going to use the regex tagger. So again, we can select regex tagger. And for the second filter, our search is going to be basically the type of filter on the right hand side this is just the regex tagger. And for the second filter configuration, we’re going to specify and regex tagger because that’s the one that we customized ourselves. We can click on these things regex Tiger and click on OK. Now you will notice that we have okay we have the ends XML which is an XML filter, we have the end drug X Tiger which is the regex Tiger. And lastly we have the end XML with regex Tiger which is a cascading filter okay. That what we have done we got we have created the XML filter we have created the regex tagger for the amount of content and then we have created in the cascading filter for the project that we are going to create now. So So let’s go ahead, let’s click on Close. And let’s go to Mimikyu. And we’re going to create a new project for this file. So we can click on create a new project without a template, we don’t actually need any template. And this is the window that we’re going to see. Here, we’re going to call it my project, for example, we’re going to give it the source language and the target language. In my case, I’m going to say English, United States into Spanish from Spain, all of the fields can be left empty, we don’t actually need this for the test. And now, the project directory shows the default directory, however, we can actually change this directory. And let me actually put these same directory in here and select folder, you will notice that now we have these directory and the name for the project is going to be added into a new folder. Okay, that’s going to be created automatically. And the same. So created by created add deadline, etc, we’re not going to use those settings, we can click on next. And you will see here that we have the translation documents. This is the place where we’re going to add the XML, we’re going to use import with options. And now we can select the file which is parsing dot XML. And this is a new window for document import options. You will notice here that for this XML, we have the default XML filter. But what we are going to do is basically to click here right, so or even here. So this is one XML files, and this is the parsing. That’s why we have two rows here. We can select here, and we’re not going to use the default XML parser. What we’re going to do is to use or cascading filter, which is again using the XML parser that we customized and the regex tagger, so we can click on an XML with regex tagger. actually click this again, this is it. And now we make sure that when we process these XML file, we’re going to use this cascading filter. We’re going to click on OK. It’s going to be added here. And now we can click on Next. This is the place where we would add the translation memories, we don’t have any for this example. So we can click on next. The term base is the same, we could add any term base to the project. And we can click on Finish.

Now this is the projector we created. So this is the party and dot XML. And we’re going to double click on this file in order to see it on the in the editor. Alright, and you will notice here that we have the first translation unit with student name. And this is completely protected, this has been protected with the regex tagger. The same happens for the instructor name, placeholder, and any other placeholder that we have included the backslash, N for the new line, and even the number of weeks here that we have for these placeholder. Alright, so that’s all for this first part of MOQ. I hope you have learned some new stuff. And for the second part, what you are going to see is how you can create a new parser for text files, actually for any kind of text file, no matter the structure. And we’re going to see how we can use regular expressions in order to add that file and extract the strings for translation.

We’re always creating new localization content

Make sure you don’t miss anything. Join 3909 other professionals on our mailing list and be the first to get our upcoming newsletter.

Localization Academy

Available

Upcoming

How To Create XML Filter In MemoQ

We’re always creating new localization content

If you enjoyed that, you’ll love these…

How To Customize TXT Filter In Phrase TMS

How To Import JSON Into Phrase TMS

Word Count Analysis 🎮

Localization Academy

Localization courses

Resources

Be the first to get our newsletter

Connect with us