How To Customize TXT Filter In Phrase TMS

by Andrej Zito
on April 24, 2024

Is Phrase your favorite TMS? In this tutorial, you’ll learn how to customize the TXT filter in Phrase TMS with our instructor Carlos.

Carlos García Gómez

Hey everyone, and welcome to the second tutorial for phrase TMS. In this video, we’re going to learn how to customize a text filter by extracting the translatable strings, and separating those from the code. So let’s dive in. Alright, so this is the text file that we’re going to work with for the tutorial. If you have followed the full video series, you will find this file familiar. So it’s basically a text file where we need to specify what are the strings for translation. So as you can see, we have 10 lines. And let me actually maximize these a little bit. We have 10 lines of code, all of them following a similar pattern. So all of them are starting with the set, then the identifier equals symbol and the string for translation. And at the very end of each line, we’re going to find the comment. Okay, so the same happens for line number one. But same for line number two, this is your the identifier. This is the string for translation.

And this is your song command, and the same for all of them. As you can see, we also have some embedded content. Alright, so first off, we have some HTML tags in here like the bold tag. And here we have the closing the tag for bold text in HTML. And we also have some placeholders in this pattern. Okay, the similar pattern with the opening curly bracket and the closing curly bracket. So what we are going to do is to go straight to the phrase TMS tutorial project that we created in the previous tutorial, where we have the JSON file, and we’re going to create a new job. For this case, we’re going to work with the text filter. So let me just first choose the file and specify strings dot txt, and we need to scroll down a little bit. And we will find a txt. This is the fields that we’re going to play with. And again, if you hover your mouse over each of these fields, you’re going to find some notes and even some links to the corresponding reg x p manual. Okay? Again, this is the place where you will find useful information on general examples. And more specifically on the txt input, which is the one that we’re going to do right now. For example, you have some examples, where this is the line on your source file. And the string for translation is actually found here. So to import takes between this part and this part, which means between this part and this part, so this is the string port translation, use reg XP, this one, okay.

And the same for different structures. If you are familiar with regular expressions, you will notice that what we need in phrase TMS is to specify with a look behind. And we know it’s a look behind because of the syntax, that before the text for translation. And this is the text for translation. So before, we need to have this pattern, okay, which is the hash has one hash has space, that’s the pattern that we have before the text for translation. This is the text for translation. And after the text for translation, we have the look ahead. Okay, so the Look Ahead, is done with the syntax, the question mark the equal symbol, and then it needs to be followed by this part, which is basically the space as as that has us, this is just an example. But we’re going to work with a similar pattern, okay. So, in reality, what we need to check is what we have before the strings for translation. So, if the strings for translation are found in here, inside the quotes, what we need to do is what do we have in before Alright, so, sometimes this can be done also in other tools, we can specify the full part, okay? If that’s not possible, or if we have some issues or whatever, we can specify just this part or maybe just this part or even just the code A, the larger the look behind is the more robust your regular expression will be. Now, the problem is that when we are working with look behind and look ahead, we are going to have some technical limitations that we will see later. For now, let’s just first create the translatable text, okay, and the translatable text can be found at let me actually toggle between the file and phrase. So we are going to need, look behind and look ahead and inside we have the text for translation.

Okay. If we specify, look behind, we’re looking to the left of the string for translation. So what do we have before Well, in reality, we have the hash set, and then the round bracket, the dollar symbol, and then the key or the identifier, it’s the same for all of them, followed by a space, an equal symbol, and a space. Now, some tricky part here. And then obviously, the code. By the way, some tricky part here is that when we are working with the look behind, we can’t actually use quantifiers, or optional characters, we need to use a regular expression with a fixed length. So for example, if we have here, some white spaces, here, and here, imagine that for example, in here, we don’t have a space, this would actually be a valid file, or right, but we can’t use quantifiers, or optional characters in the loop behind unfortunately, okay, they’re the regex engine does not allow this. So the thing that we need to do is to specify the maximum number of characters that we have.

Alright, so in this case, the maximum character that we had would be all of this. But if you can identify some risks, when it comes to remove it some spaces in the source file, where obviously, these would not follow the same pattern, because after the equal symbol, we would not have the whitespace. Okay, we need to actually find the minimum in that case. So we start from the maximum, and then we start iterating, or look through the right and see what’s at least the maximum number of characters that we can use. And if we can’t use all of those characters, we can simply leave the minimum. Okay, so in this case, every time where our code is found, it’s the first code that we have in each line. Okay, so if we have this quote, this is the first code that we have in line number one, this is the first code that we have in line at number two, etc.

So something that we can do is to add the look behind expression, okay. And if you are not familiar with these kinds of expressions, you can always refer to the corresponding part, okay? Which is, by the way, it’s the other way around. So it’s this and this, okay? Which makes sense, because the equal symbol is actually saying that, in this part, we need to have this pattern. So preceding the text for translation, we need this part. And if we use the exclamation mark, it means that before this text for translation, we don’t have this pattern. Okay? So it’s the question mark, the less than and the equal symbol. Alright. But again, you can always refer to this part because for the look behind, and for the Look Ahead, it can be some way tricky. So you can always refer to the original guideline. Okay, this is the syntax for the look behind. Now, what we said is that we need to have a quote before the text for translation. So this is the text for translation. And just before we have the quote, that’s at least the minimum that we can have here. Again, we added all of the whitespaces equal symbol and then another whitespace, in case there is any whitespace that is not appearing in one of the lines do we’ll get into trouble because that line is not going to be included for translation.

Okay, that’s the technical limitation that I find here with phrase TMS because you need to rely on the loop behind expressions. Now for the Look Ahead, that’s no problem because we can simply specify this part. So for example, the quote and then the closing round bracket that found at the very end, or just after each string for translation, okay, so all of these characters are found in all of the lights. Now when we need to do is to use the question mark, the equal symbol, because again, this is basically saying that following the text for translation, we need to find this expression here.

So it’s basically a quote and now it’s round bracket, a closing round bracket, and that’s why we need to escape it with a backslash because we need to capture a literal closing round bracket okay. Let me just copy and paste these here in notepad plus plus, Find Next as you can see, the text for translation is going to be extracted the same way. So this is the text for translation, Find Next and find all so here we can see that exactly these parts of text and we can see in yellow is going to be extracted for translation. Now that’s okay. So we We can move on to the next part. And obviously, depending on the source file that you are working with, you might have a different structure, but you’re going to always play around with the Look Ahead. And with the look behind expression, just make sure that as I said before, with the look behind, we can’t use quantifiers. Or we can’t use optional characters in some way that’s unlimited. Now for the Convert to phrase TMS tax.

This is very similar to what we had in the previous tutorial for the JSON file. This is the place where we’re going to specify the embedded content or the placeholders. In this file, we have two types of embedded content. One of them is HTML tags that we can see in here. And another pattern is the placeholders in the form of curly bracket for the opening character and curly bracket closing curly brackets for the closing character. Now, it’s two different regular expressions, so we are going to separate them with a pipe, the first expression is going to contain anything. So we start off with the letter, because this is just the beginning of any HTML tag, we start off with the less than, and then inside we can have any character that is not the greater than one or more times, until we find the greater than, okay, this is just a very useful way of doing this. Alright, so let me just copy and paste this by next, where we see that it’s capturing this tag, and the corresponding closing tag, if we had any other tag like strong, italic, or, you know, lists, whatever, that will also be captured either if it’s the opening or the closing tags. And it works because with the square brackets, we are saying that we are capturing one character at a time, the carrot inside the square bracket means not the following character.

So we are basically capturing any character that is not this one to any capture rate that is not the greater than one or more times, until we really find the greater than character. Okay, that’s for the HTML tags. And then we need to capture this pattern, which is very similar to what we did in the previous tutorial. So we need to escape the curly brackets. And again, make sure that you escape these curly brackets with the backslash, although that’s not necessary in notepad plus plus. And in many other tools, it needs to be escaped in phrase DNS, otherwise, you’re going to get an invalid regular expression error message. Inside, we can have the usual thing. So the shorthand character class, backslash W, for any letter any digit or any underscore one or more times. Let me actually try this. Here to Find Next, find next and find next. And we don’t have any anything else. Okay, so it’s capturing the three of them. Okay, as you can see here, and again, with the pipe or with the vertical line, we can separate each regular expression that we need to add here to convert to phrase TMS tax. Now, the context key, this is going to be done the same way as the translatable text we need to use, look ahead or look behind.

Now, the context key, as you can see here, it’s different to how we handle this with Jason, with JSON, we had the context key, which was the real key. And we also had the context note, which was used as a comment. Something that we can do here, because we only have one field is what would you prefer? Would you prefer? Sort of? So for example, for this line, when the language is translating? Would you prefer them to have homepage title as the key or the identifier? Or would you prefer to use the command if any index in the line for context, sometimes, like in this example, we have a command for each line. So if you have this kind of structure, it would be better off maybe to add the comment, because that’s going to add some more reference to the language. But similar times, you wouldn’t have a comment in the same line. And it’s better off to just use the identifier that we have for each live. Both options are okay, and this is going to be totally customizable. Okay. So let’s imagine that for example, we and in fact, when you are translating, or when the language is translating, you’re going to see the word uses what you get when you will see the preview of this text file. And you’re going to easily see while translating, if there is any command or if any, there is any identifier. Now for this purpose, let’s just do the comments. Alright, so let’s Imagine that we have the comments at the very end of each line. So what we need to do here is just to add the pattern that we have for each comment. And then we’re going to add the text that is going to be extracted as the command. In this case, again, we can use the key as the command, or the key as the identifier, it’s up to you. Now inside this group, we’re going to add the look behind. Because here, what we need to extract is this command, right, all of these comments.

So we need to find a pattern of what we have before the command. Alright, so before the command, we usually have this, alright, we actually have this, the space, the backslash, forward slash forward slash space, as you can see, in all of these cases, now something that you can do is, for example, to add the backslash, backslash. And then just a space, okay, because here, we always have a space. The thing is that, as I said before, we can’t have the optional characters, because in a normal regular expression, we would say that this whitespace is optional, or we could even have a whitespace, zero or more times, but with the look behind, we can’t use that functionality, unfortunately. So in theory, this is going to do the job. So let me actually check this file next time. As you can see, it’s going to extract the developer comment, in this case, developer, document, 1234, etc, it will work for any kind of comment, as long as what we have before is or was last for was last and then a space. Okay, and this is the text that is going to be extracted. All of this is going to be taken as a condition. That’s what the look behind is in the end. That’s what how it’s done. If for some reason, and let me just, I will not use it here in the context key. But let’s just do it all together. So if we wanted to extract the identifiers instead, if you find it more useful, or if you don’t have a command or sought in every line, you can do the opposite.

So you will need to extract these text, okay, which is basically a set of digits, letters, underscores, we don’t have any one or more types, okay, in the end is letters, or digits. Okay, we could have any underscore, but we don’t have any. So we need to capture in the loop behind what we have before the identifier. And before the identifier, we always have this text. Okay, so the hash set. And then we actually capture this way to keep the question mark, the less than and the equal symbol because that’s the syntax for the loop behind. And then the hash is a literal character set, the round bracket needs to be escaped. So these run Brocket needs to be taken as a literal character. And that’s why the run bracket here is escaped with the backslash. And the same happens with the dollar symbol, the dollar symbol is a special character in regular expressions. So this character needs to be escaped as well. And then these round brackets is just the closing round bracket for the whole loop behind the expression.

Let me actually check this by next. Okay, as you can see, only the identifier or the key is extracted. So if you wanted to use the identifier, instead of document, you can use this regular expression in here, okay, instead of the comment that we have done here, it would work the same way. Now we’re going to use this expression just to extract the comment. And lets us create so we import it uploaded the string to txt file, let’s create the job for this project. We can see here strings dot txt, this is the second job so we can open the file. And now in the phrase DMS editor, you will be able to see all of the text that has been imported alright. So originally we had all of these texts.

So for each line, we had this part, the second line this part and we have been able to properly extracted what we need to translate right so instead of also including the set header, homepage title etc. We have the clean translation units. Alright, and as we can see, we scroll down we go until sign in which is the very last string that we can find in the file. Here as usual, we have in blue, we have the tags that we have imported. So the b tag and then the closing b tag is found here that has been selected or extracted or protected with the expression that we use for hasty meta tags. And the number two is the student level, which is the other pattern that we use. So as you can see, using the pipeline in the regular expression for the phrase DMS tags did the job. Now something you can see is that, all right, so the text for translation here in the preview tab is found in gray. And any other text in black is not imported for translation. Okay. And then in the context note, if we click on the first translation unit, you will see that the key is developer comment, one, because the key for this string has been extracted as the command Okay, remember that we had two possibilities to extract these parts, or the command assert.

So we extracted the command, we click on courses, for instance, we can see that the key is developer command six, because for recourses, we have developer comment six. Again, if we had used this other regular expression, you will have seen that the context key that we have here should have been many one instead of developer comment six. So as you can see, it’s totally customizable. So that’s really it, we have seen how to create our text filter. Looking at the structure of the text file, identify where is the text for translation, we have used, the look behind and the Look Ahead. We have seen also the limitations that we have for the look behind because sometimes if we need to use a quantifier, or an optional character that cannot be done with a look behind, unfortunately. And that’s the way it works with face DMS. And lastly, we have seen how to import the embedded content, both HTML tags, and this other pattern and how to also use the developer comments as the key in this case, the context key for reference when the linguist is translating. And that’s all. I hope you have learned some good tricks when preparing text files for translation using phrase TMS. In the next tutorial, which is the third and last tutorial for this tool. You’re going to learn how to run a pseudo translation and how to export the translated files, making sure that your parsers are correct. So stay tuned and see you in the next video.

We’re always creating new localization content

Make sure you don’t miss anything. Join 7470 other professionals on our mailing list and be the first to get our upcoming newsletter.

How To Customize TXT Filter In Phrase TMS

We’re always creating new localization content

If you enjoyed that, you’ll love these…

Translation Memory (TM) 🎮

Computer-Assisted Translation (CAT) 🎮

Multi-Language Vendor (MLV) 🎮

Localization courses

Resources

Connect with us