WEBVTT 00:01.280 --> 00:00:02.560 Hello everybody. 00:02.560 --> 00:00:04.400 My name is Jean-Christophe Helary, 00:00:04.400 --> 00:00:05.680 and today I’m going to talk about 00:00:05.680 --> 00:00:08.320 Emacs manuals translation and OmegaT. 00:00:08.320 --> 00:00:10.960 Thank you for joining the session. 00:10.960 --> 00:00:12.880 Translation in the free software world 00:12.880 --> 00:00:15.040 is really a big thing. You already know 00:15.040 --> 00:00:17.119 that most of the Linux distributions, 00:17.119 --> 00:00:18.720 most of the software packages, 00:00:18.720 --> 00:00:19.920 most of the websites 00:00:19.920 --> 00:00:22.320 are translated by dozens of communities 00:00:22.320 --> 00:00:23.439 using different processes 00:23.439 --> 00:00:24.880 and file formats. 00:24.880 --> 00:00:27.359 Translation and localization 00:27.359 --> 00:00:29.599 are things we know very well. 00:29.599 --> 00:00:30.400 It’s a tad different 00:00:30.400 --> 00:00:32.160 for the Emacs community. 00:32.160 --> 00:00:34.079 We do not have a localization process 00:34.079 --> 00:00:35.200 because it’s quite complex 00:00:35.200 --> 00:00:35.920 and because we don’t 00:00:35.920 --> 00:00:37.600 have the resources yet. 00:37.600 --> 00:00:39.920 Still, we could translate the manuals, 00:00:39.920 --> 00:00:41.200 and translating the manuals 00:00:41.200 --> 00:00:42.399 would probably bring a lot of good 00:00:42.399 --> 00:00:45.600 to the Emacs community at large. 00:45.600 --> 00:00:47.920 So what’s the state of the manuals? 00:47.920 --> 00:00:51.199 As of today, we have 182 files 00:51.199 --> 00:00:54.160 coming in .texi and .org format. 00:54.160 --> 00:00:56.559 We’ve got more than 2 million words. 00:56.559 --> 00:00:57.360 We’ve got more than 00:00:57.360 --> 00:00:59.039 50 million characters. 00:00:59.039 --> 00:01:00.559 So that’s quite a lot of work, 01:00.559 --> 00:01:04.559 and obviously, it’s not a one person job. 01:04.559 --> 00:01:06.159 When we open .texi files, 00:01:06.159 --> 00:01:07.760 what do we have? 01:07.760 --> 00:01:09.439 Well, we actually have a lot of things 01:09.439 --> 00:01:10.560 that the translators 00:01:10.560 --> 00:01:12.400 shouldn’t have to translate. 01:12.400 --> 00:01:13.680 Here we can see that only 00:01:13.680 --> 00:01:15.040 the very last segment, 00:01:15.040 --> 00:01:16.400 the very last sentence 00:01:16.400 --> 00:01:18.080 should be translated. 01:18.080 --> 00:01:19.360 All those meta things 00:01:19.360 --> 00:01:20.240 should not be under 00:01:20.240 --> 00:01:24.479 the translator’s eyes. 01:24.479 --> 00:01:26.720 How do we deal with this situation? 01:26.720 --> 00:01:27.680 For code files, we have 00:01:27.680 --> 00:01:29.360 the gettext utility that converts 00:01:29.360 --> 00:01:30.640 all the translatable strings 00:01:30.640 --> 00:01:32.079 into a translatable format, 00:01:32.079 --> 00:01:33.840 which is the .po format. 01:33.840 --> 00:01:35.520 And that .po format is ubiquitous, 00:01:35.520 --> 00:01:36.400 even in the non-free 00:01:36.400 --> 00:01:38.720 software translation industry. 01:38.720 --> 00:01:39.520 For documentation, 00:01:39.520 --> 00:01:40.720 we have something different. 00:01:40.720 --> 00:01:42.000 It’s called po4a, 00:01:42.000 --> 00:01:45.119 which is short for ‘po for all’. 01:45.119 --> 00:01:46.399 When we use po4a 00:01:46.399 --> 00:01:49.200 on those 182 .texi and .org files, 00:01:49.200 --> 00:01:50.479 what do we get? 01:50.479 --> 00:01:52.640 We get something that’s much better. 01:52.640 --> 00:01:54.799 Now we have three segments. 01:54.799 --> 00:01:55.759 It’s not perfect because, 00:01:55.759 --> 00:01:56.399 as you can see, 00:01:56.399 --> 00:01:57.280 the two first segments 00:01:57.280 --> 00:01:58.880 should not be translated. 01:58.880 --> 00:01:59.520 So there’s still 00:01:59.520 --> 00:02:02.479 room for improvement. 02:02.479 --> 00:02:04.960 Now, when we put that file set 00:02:04.960 --> 00:02:07.119 into OmegaT, we considerably reduce 00:02:07.119 --> 00:02:08.800 the words total. 02:08.800 --> 00:02:11.360 We now have 50% fewer words 00:02:11.360 --> 00:02:14.239 and 23% fewer characters to type, 02:14.239 --> 00:02:15.680 but that’s still a lot of work. 00:02:15.680 --> 00:02:17.599 So let’s talk about OmegaT now 00:02:17.599 --> 00:02:22.239 and see where it can help. 02:22.239 --> 00:02:25.440 OmegaT is a GPL3+ Java8+ 02:25.440 --> 00:02:27.599 Computer Aided Translation tool. 02:27.599 --> 00:02:29.440 We call them CATs. 02:29.440 --> 00:02:30.720 CATs are to translators 00:02:30.720 --> 00:02:33.280 what IDEs are to programmers. 02:33.280 --> 00:02:35.040 They leverage the power of computers 00:02:35.040 --> 00:02:36.480 to automate our work, 00:02:36.480 --> 00:02:38.400 which is, reference searches, 00:02:38.400 --> 00:02:40.800 fuzzy matching, automatic insertions, 00:02:40.800 --> 00:02:44.080 and things like that. 02:44.080 --> 00:02:46.319 OmegaT is not really recent. 02:46.319 --> 00:02:48.319 It will turn 20 next year, 02:48.319 --> 00:02:48.959 and at this point, 00:02:48.959 --> 00:02:51.440 we have about 1.5 million downloads 00:02:51.440 --> 00:02:53.200 from the SourceForge site, 00:02:53.200 --> 00:02:54.080 which doesn’t mean much 00:02:54.080 --> 00:02:55.040 because that includes 00:02:55.040 --> 00:02:56.480 files used for localization 00:02:56.480 --> 00:02:57.920 and manuals, but still 00:02:57.920 --> 00:02:59.599 it’s a pretty big number. 02:59.599 --> 00:03:00.720 OmegaT is included in 00:03:00.720 --> 00:03:02.400 a lot of Linux distributions, 00:03:02.400 --> 00:03:03.680 but as you can see here, 03:03.680 --> 00:03:05.920 it’s mostly downloaded on Windows systems 00:03:05.920 --> 00:03:06.800 because translators 00:03:06.800 --> 00:03:09.680 mostly work on Windows. 03:09.680 --> 00:03:11.120 OmegaT comes with a cool logo 00:03:11.120 --> 00:03:12.080 and a cool site too, 00:03:12.080 --> 00:03:13.920 and I really invite you to visit it. 00:03:13.920 --> 00:03:16.159 It’s omegat.org, and you’ll see 03:16.159 --> 00:03:17.280 all the information you need, 00:03:17.280 --> 00:03:19.040 plus downloads to Linux versions, 00:03:19.040 --> 00:03:22.080 with or without Java included. 03:22.080 --> 00:03:24.799 So what does OmegaT bring to the game? 03:24.799 --> 00:03:26.560 Professional translators have to deliver 03:26.560 --> 00:03:27.680 fast, consistent, 00:03:27.680 --> 00:03:29.519 and quality translations, 03:29.519 --> 00:03:30.720 and we need to have proper tools 00:03:30.720 --> 00:03:32.159 to achieve that. 00:03:32.159 --> 00:03:34.239 I wish po-mode was part of the toolbox, 00:03:34.239 --> 00:03:35.120 but that’s not the case, 03:35.120 --> 00:03:36.560 and it’s a pity. 03:36.560 --> 00:03:39.760 So we have to use those CAT tools. 03:39.760 --> 00:03:41.440 Let me show you what OmegaT looks like 03:41.440 --> 00:03:43.120 when I open this project that I created 03:43.120 --> 00:03:45.200 for this demonstration. 03:45.200 --> 00:03:46.640 The display is quite a mouthful, 00:03:46.640 --> 00:03:47.760 but you can actually modify 00:03:47.760 --> 00:03:49.519 all windows as needed. 03:49.519 --> 00:03:50.400 I just want to show you 00:03:50.400 --> 00:03:51.120 everything at once 00:03:51.120 --> 00:03:53.680 to give you a quick idea of the thing. 03:53.680 --> 00:03:55.200 You have various colors, windows, 00:03:55.200 --> 00:03:55.920 and all those spaces 00:03:55.920 --> 00:03:57.120 have different functions 03:57.120 --> 00:03:58.560 that help the translator, 00:03:58.560 --> 00:03:59.360 and that you’re probably 00:03:59.360 --> 00:04:02.879 not familiar with. 04:02.879 --> 00:04:04.080 I’m going to introduce you 00:04:04.080 --> 00:04:05.680 to the interface now. 04:05.680 --> 00:04:07.519 So first, we have the editor. 04:07.519 --> 00:04:09.439 The editor comes in two parts: 04:09.439 --> 00:04:10.480 the current segment, 00:04:10.480 --> 00:04:12.319 which is associated to a number, 00:04:12.319 --> 00:04:13.519 and all the other segments, 00:04:13.519 --> 00:04:15.840 above or below. 04:15.840 --> 00:04:16.720 At the top of the window, 00:04:16.720 --> 00:04:18.720 you can see the first three segments 00:04:18.720 --> 00:04:20.799 that were in the .po file. 04:20.799 --> 00:04:22.880 The last one here, the fourth one, comes 00:04:22.880 --> 00:04:28.720 with an automatic fuzzy match insertion. 04:28.720 --> 00:04:30.880 Such legacy translations are what we 04:30.880 --> 00:04:32.720 call ‘translation memories’. 04:32.720 --> 00:04:35.280 OmegaT has inserted this one automatically 00:04:35.280 --> 00:04:37.120 because I told it to do so, 04:37.120 --> 00:04:38.560 and for my security, it comes with 00:04:38.560 --> 00:04:40.639 the predefined fuzzy prefix 00:04:40.639 --> 00:04:41.919 that I will have to remove 00:04:41.919 --> 00:04:44.880 to validate the translation. 04:44.880 --> 00:04:47.919 Our next feature is the glossary feature. 04:47.919 --> 00:04:48.479 In this project, 00:04:48.479 --> 00:04:50.160 we have a lot of glossary data. 00:04:50.160 --> 00:04:52.560 Some is relevant and some is not. 04:52.560 --> 00:04:53.919 In the segment that I’m translating 00:04:53.919 --> 00:04:55.199 at the moment, you can see 00:04:55.199 --> 00:04:57.520 underlined items. 04:57.520 --> 00:04:59.040 This pop-up menu on the right 00:04:59.040 --> 00:05:02.240 allows me to enter the terms as I type. 05:02.240 --> 00:05:04.639 It’s kind of an auto insertion system 00:05:04.639 --> 00:05:07.039 that also supports history predictions, 00:05:07.039 --> 00:05:14.479 predefined strings, and things like that. 05:14.479 --> 00:05:15.440 In the part on the right, 00:05:15.440 --> 00:05:17.120 we have reference information 00:05:17.120 --> 00:05:18.240 that comes directly from 00:05:18.240 --> 00:05:21.440 the .po and .texi files. 05:21.440 --> 00:05:23.440 We also have notes that I can share 00:05:23.440 --> 00:05:25.759 with fellow translators, 05:25.759 --> 00:05:28.080 and we have numbers that tell me 00:05:28.080 --> 00:05:31.199 that I still have 143 000 segments more to go 00:05:31.199 --> 00:05:35.280 before I complete this translation. 05:35.280 --> 00:05:37.120 As we see, there are plenty of strings 05:37.120 --> 00:05:40.000 that we really don’t want to have to type. 05:40.000 --> 00:05:42.160 For example, those strings 00:05:42.160 --> 00:05:43.840 are typical .texi strings 00:05:43.840 --> 00:05:45.039 that the translator 00:05:45.039 --> 00:05:46.479 should really not have to type. 00:05:46.479 --> 00:05:47.360 So we’re going to have to 00:05:47.360 --> 00:05:50.400 do something about that. 05:50.400 --> 00:05:51.600 we’re going to have to create 00:05:51.600 --> 00:05:52.479 protected strings 00:05:52.479 --> 00:05:54.400 with regular expressions, 05:54.400 --> 00:05:56.800 so that the strings can be visualized 00:05:56.800 --> 00:05:59.120 right away in the source segment, 05:59.120 --> 00:06:00.479 entered semi-automatically 00:06:00.479 --> 00:06:01.680 in the target segment, 00:06:01.680 --> 00:06:04.479 and checked for integrity. 06:04.479 --> 00:06:06.479 The regular expression I came up with 06:06.479 --> 00:06:08.160 for defining most of the strings 00:06:08.160 --> 00:06:09.600 is this one, 06:09.600 --> 00:06:11.120 and I’m not a regular expression pro 00:06:11.120 --> 00:06:13.360 so I’m sure some of you will correct me. 00:06:13.360 --> 00:06:14.560 But this expression gives me 00:06:14.560 --> 00:06:15.919 a good enough definition 00:06:15.919 --> 00:06:17.919 even though it does not yet include 00:06:17.919 --> 00:06:20.960 Org mode syntax. 06:20.960 --> 00:06:22.344 So now we have all those 00:06:22.344 --> 00:06:23.440 .texi specific things 00:06:23.440 --> 00:06:24.960 that we don’t want to touch 06:24.960 --> 00:06:26.100 displayed in gray. 00:06:26.100 --> 00:06:27.680 Actually, you may have noticed 00:06:27.680 --> 00:06:28.479 that I cheated a bit, 06:28.479 --> 00:06:30.319 because here I added the years 00:06:30.319 --> 00:06:32.000 and the Free Software Foundation name 00:06:32.000 --> 00:06:34.000 to the previous regular expression 00:06:34.000 --> 00:06:35.520 to show you that you can protect 00:06:35.520 --> 00:06:38.560 any kind of string, really. 06:38.560 --> 00:06:39.520 So what we have now 00:06:39.520 --> 00:06:41.360 is a way to visualize the strings 00:06:41.360 --> 00:06:43.440 that we do not want to touch, 06:43.440 --> 00:06:45.440 but we still have to enter all of them 00:06:45.440 --> 00:06:46.880 in the translation. 06:46.880 --> 00:06:48.319 For that, we have the pop-up menu 00:06:48.319 --> 00:06:50.400 that I used earlier with the glossary, 00:06:50.400 --> 00:06:51.520 and we also have items 00:06:51.520 --> 00:06:52.400 in the edit menu 00:06:52.400 --> 00:06:53.919 that come with shortcuts 00:06:53.919 --> 00:06:57.199 for easy insertion of missing tags. 06:57.199 --> 00:06:58.800 Last, but certainly not least, 00:06:58.800 --> 00:07:00.800 we can now validate our input. 00:07:00.800 --> 00:07:02.479 Here, OmegaT properly tells me 00:07:02.479 --> 00:07:05.759 that I missed 7 protected strings, 07:05.759 --> 00:07:07.599 I entered only 1998, 00:07:07.599 --> 00:07:09.280 but there were five different years, 00:07:09.280 --> 00:07:10.479 the copyright string, 00:07:10.479 --> 00:07:14.240 and the FSF name string. 07:14.240 --> 00:07:15.970 With all this almost native 00:07:15.970 --> 00:07:16.960 Texinfo support, 00:07:16.960 --> 00:07:18.880 we have much less things to type, 07:18.880 --> 00:07:19.919 and there is a much lower 00:07:19.919 --> 00:07:21.120 potential for errors. 00:07:21.120 --> 00:07:25.199 But we agree, it’s still a lot of work. 07:25.199 --> 00:07:26.319 What we’d like now 00:07:26.319 --> 00:07:27.840 is to work with fellow translators, 00:07:27.840 --> 00:07:28.720 and here we need to know 00:07:28.720 --> 00:07:29.840 that OmegaT is actually 00:07:29.840 --> 00:07:32.080 a hidden svn/git client, 00:07:32.080 --> 00:07:34.240 and team projects can be hosted 07:34.240 --> 00:07:36.319 on svn/git platforms. 07:36.319 --> 00:07:37.199 Translators don’t need to 00:07:37.199 --> 00:07:38.880 know anything about VCS. 00:07:38.880 --> 00:07:40.720 They just need access credentials, 00:07:40.720 --> 00:07:42.400 and OmegaT commits for them. 00:07:42.400 --> 00:07:44.080 This way we do not have to use 00:07:44.080 --> 00:07:45.759 ugly and clumsy web-based 00:07:45.759 --> 00:07:47.199 translation interfaces, 00:07:47.199 --> 00:07:48.800 and we can use a powerful 00:07:48.800 --> 00:07:51.440 offline professional tool. 07:51.440 --> 00:07:52.479 So this is how it looks 00:07:52.479 --> 00:07:54.160 when you look at the platform 00:07:54.160 --> 00:07:55.919 where I hosted this project. 07:55.919 --> 00:07:57.199 The last updates are from 00:07:57.199 --> 00:07:58.639 20 days and 30 seconds ago 00:07:58.639 --> 00:08:00.720 when I created this slide, 08:00.720 --> 00:08:02.479 and you can see that I had a partner 00:08:02.479 --> 00:08:04.639 who worked with me on the same file set. 08:04.639 --> 00:08:05.520 Although it looks like 00:08:05.520 --> 00:08:06.879 we actually committed the translation 00:08:06.879 --> 00:08:07.680 to the platform, 00:08:07.680 --> 00:08:11.039 it was not us, but OmegaT. 00:08:11.039 --> 00:08:13.599 OmegaT does all the heavy-duty work. 08:13.599 --> 00:08:15.039 It regularly saves to 00:08:15.039 --> 00:08:16.879 and syncs from the servers. 08:16.879 --> 00:08:18.720 Translators are regularly kept updated 08:18.720 --> 00:08:20.479 with work from fellow translators, 00:08:20.479 --> 00:08:21.680 and when necessary, 00:08:21.680 --> 00:08:23.360 OmegaT offers a simple 00:08:23.360 --> 00:08:25.440 conflict-resolution dialogue. 08:25.440 --> 00:08:27.039 Translators never have to do anything 08:27.039 --> 00:08:29.360 with svn or git ever. 08:29.360 --> 00:08:30.800 And now we can envision a future 00:08:30.800 --> 00:08:31.599 not so far away 00:08:31.599 --> 00:08:33.120 where the manuals will be translated 00:08:33.120 --> 00:08:34.159 and eventually included 00:08:34.159 --> 00:08:35.279 in the distribution, 00:08:35.279 --> 00:08:36.080 but that’s a topic 00:08:36.080 --> 00:08:39.760 for a different presentation. 08:39.760 --> 00:08:42.080 So we’ve reached the end of this session. 08:42.080 --> 00:08:44.240 Thank you very much again for joining it. 08:44.240 --> 00:08:45.600 There are plenty of topics 00:08:45.600 --> 00:08:46.880 I promised I would not address, 00:08:46.880 --> 00:08:50.000 and I think I kept my promise. 08:50.000 --> 00:08:51.600 There will be a Q&A now, 00:08:51.600 --> 00:08:52.517 and I also started 00:08:52.517 --> 00:08:53.600 a thread about this talk 00:08:53.600 --> 00:08:55.519 on Reddit last Saturday. 08:55.519 --> 00:08:57.279 You can find me on the emacs-help 00:08:57.279 --> 00:08:59.200 and emacs-devel lists as well, 00:08:59.200 --> 00:09:00.480 so don’t hesitate to send me 00:09:00.480 --> 00:09:02.080 questions and remarks. 09:02.080 --> 09:06.760 Thank you again, and see you around.