1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
|
WEBVTT
00:01.280 --> 00:00:02.560
Hello everybody.
00:02.560 --> 00:00:04.400
My name is Jean-Christophe Helary,
00:00:04.400 --> 00:00:05.680
and today I’m going to talk about
00:00:05.680 --> 00:00:08.320
Emacs manuals translation and OmegaT.
00:00:08.320 --> 00:00:10.960
Thank you for joining the session.
00:10.960 --> 00:00:12.880
Translation in the free software world
00:12.880 --> 00:00:15.040
is really a big thing. You already know
00:15.040 --> 00:00:17.119
that most of the Linux distributions,
00:17.119 --> 00:00:18.720
most of the software packages,
00:00:18.720 --> 00:00:19.920
most of the websites
00:00:19.920 --> 00:00:22.320
are translated by dozens of communities
00:00:22.320 --> 00:00:23.439
using different processes
00:23.439 --> 00:00:24.880
and file formats.
00:24.880 --> 00:00:27.359
Translation and localization
00:27.359 --> 00:00:29.599
are things we know very well.
00:29.599 --> 00:00:30.400
It’s a tad different
00:00:30.400 --> 00:00:32.160
for the Emacs community.
00:32.160 --> 00:00:34.079
We do not have a localization process
00:34.079 --> 00:00:35.200
because it’s quite complex
00:00:35.200 --> 00:00:35.920
and because we don’t
00:00:35.920 --> 00:00:37.600
have the resources yet.
00:37.600 --> 00:00:39.920
Still, we could translate the manuals,
00:00:39.920 --> 00:00:41.200
and translating the manuals
00:00:41.200 --> 00:00:42.399
would probably bring a lot of good
00:00:42.399 --> 00:00:45.600
to the Emacs community at large.
00:45.600 --> 00:00:47.920
So what’s the state of the manuals?
00:47.920 --> 00:00:51.199
As of today, we have 182 files
00:51.199 --> 00:00:54.160
coming in .texi and .org format.
00:54.160 --> 00:00:56.559
We’ve got more than 2 million words.
00:56.559 --> 00:00:57.360
We’ve got more than
00:00:57.360 --> 00:00:59.039
50 million characters.
00:00:59.039 --> 00:01:00.559
So that’s quite a lot of work,
01:00.559 --> 00:01:04.559
and obviously, it’s not a one person job.
01:04.559 --> 00:01:06.159
When we open .texi files,
00:01:06.159 --> 00:01:07.760
what do we have?
01:07.760 --> 00:01:09.439
Well, we actually have a lot of things
01:09.439 --> 00:01:10.560
that the translators
00:01:10.560 --> 00:01:12.400
shouldn’t have to translate.
01:12.400 --> 00:01:13.680
Here we can see that only
00:01:13.680 --> 00:01:15.040
the very last segment,
00:01:15.040 --> 00:01:16.400
the very last sentence
00:01:16.400 --> 00:01:18.080
should be translated.
01:18.080 --> 00:01:19.360
All those meta things
00:01:19.360 --> 00:01:20.240
should not be under
00:01:20.240 --> 00:01:24.479
the translator’s eyes.
01:24.479 --> 00:01:26.720
How do we deal with this situation?
01:26.720 --> 00:01:27.680
For code files, we have
00:01:27.680 --> 00:01:29.360
the gettext utility that converts
00:01:29.360 --> 00:01:30.640
all the translatable strings
00:01:30.640 --> 00:01:32.079
into a translatable format,
00:01:32.079 --> 00:01:33.840
which is the .po format.
01:33.840 --> 00:01:35.520
And that .po format is ubiquitous,
00:01:35.520 --> 00:01:36.400
even in the non-free
00:01:36.400 --> 00:01:38.720
software translation industry.
01:38.720 --> 00:01:39.520
For documentation,
00:01:39.520 --> 00:01:40.720
we have something different.
00:01:40.720 --> 00:01:42.000
It’s called po4a,
00:01:42.000 --> 00:01:45.119
which is short for ‘po for all’.
01:45.119 --> 00:01:46.399
When we use po4a
00:01:46.399 --> 00:01:49.200
on those 182 .texi and .org files,
00:01:49.200 --> 00:01:50.479
what do we get?
01:50.479 --> 00:01:52.640
We get something that’s much better.
01:52.640 --> 00:01:54.799
Now we have three segments.
01:54.799 --> 00:01:55.759
It’s not perfect because,
00:01:55.759 --> 00:01:56.399
as you can see,
00:01:56.399 --> 00:01:57.280
the two first segments
00:01:57.280 --> 00:01:58.880
should not be translated.
01:58.880 --> 00:01:59.520
So there’s still
00:01:59.520 --> 00:02:02.479
room for improvement.
02:02.479 --> 00:02:04.960
Now, when we put that file set
00:02:04.960 --> 00:02:07.119
into OmegaT, we considerably reduce
00:02:07.119 --> 00:02:08.800
the words total.
02:08.800 --> 00:02:11.360
We now have 50% fewer words
00:02:11.360 --> 00:02:14.239
and 23% fewer characters to type,
02:14.239 --> 00:02:15.680
but that’s still a lot of work.
00:02:15.680 --> 00:02:17.599
So let’s talk about OmegaT now
00:02:17.599 --> 00:02:22.239
and see where it can help.
02:22.239 --> 00:02:25.440
OmegaT is a GPL3+ Java8+
02:25.440 --> 00:02:27.599
Computer Aided Translation tool.
02:27.599 --> 00:02:29.440
We call them CATs.
02:29.440 --> 00:02:30.720
CATs are to translators
00:02:30.720 --> 00:02:33.280
what IDEs are to programmers.
02:33.280 --> 00:02:35.040
They leverage the power of computers
00:02:35.040 --> 00:02:36.480
to automate our work,
00:02:36.480 --> 00:02:38.400
which is, reference searches,
00:02:38.400 --> 00:02:40.800
fuzzy matching, automatic insertions,
00:02:40.800 --> 00:02:44.080
and things like that.
02:44.080 --> 00:02:46.319
OmegaT is not really recent.
02:46.319 --> 00:02:48.319
It will turn 20 next year,
02:48.319 --> 00:02:48.959
and at this point,
00:02:48.959 --> 00:02:51.440
we have about 1.5 million downloads
00:02:51.440 --> 00:02:53.200
from the SourceForge site,
00:02:53.200 --> 00:02:54.080
which doesn’t mean much
00:02:54.080 --> 00:02:55.040
because that includes
00:02:55.040 --> 00:02:56.480
files used for localization
00:02:56.480 --> 00:02:57.920
and manuals, but still
00:02:57.920 --> 00:02:59.599
it’s a pretty big number.
02:59.599 --> 00:03:00.720
OmegaT is included in
00:03:00.720 --> 00:03:02.400
a lot of Linux distributions,
00:03:02.400 --> 00:03:03.680
but as you can see here,
03:03.680 --> 00:03:05.920
it’s mostly downloaded on Windows systems
00:03:05.920 --> 00:03:06.800
because translators
00:03:06.800 --> 00:03:09.680
mostly work on Windows.
03:09.680 --> 00:03:11.120
OmegaT comes with a cool logo
00:03:11.120 --> 00:03:12.080
and a cool site too,
00:03:12.080 --> 00:03:13.920
and I really invite you to visit it.
00:03:13.920 --> 00:03:16.159
It’s omegat.org, and you’ll see
03:16.159 --> 00:03:17.280
all the information you need,
00:03:17.280 --> 00:03:19.040
plus downloads to Linux versions,
00:03:19.040 --> 00:03:22.080
with or without Java included.
03:22.080 --> 00:03:24.799
So what does OmegaT bring to the game?
03:24.799 --> 00:03:26.560
Professional translators have to deliver
03:26.560 --> 00:03:27.680
fast, consistent,
00:03:27.680 --> 00:03:29.519
and quality translations,
03:29.519 --> 00:03:30.720
and we need to have proper tools
00:03:30.720 --> 00:03:32.159
to achieve that.
00:03:32.159 --> 00:03:34.239
I wish po-mode was part of the toolbox,
00:03:34.239 --> 00:03:35.120
but that’s not the case,
03:35.120 --> 00:03:36.560
and it’s a pity.
03:36.560 --> 00:03:39.760
So we have to use those CAT tools.
03:39.760 --> 00:03:41.440
Let me show you what OmegaT looks like
03:41.440 --> 00:03:43.120
when I open this project that I created
03:43.120 --> 00:03:45.200
for this demonstration.
03:45.200 --> 00:03:46.640
The display is quite a mouthful,
00:03:46.640 --> 00:03:47.760
but you can actually modify
00:03:47.760 --> 00:03:49.519
all windows as needed.
03:49.519 --> 00:03:50.400
I just want to show you
00:03:50.400 --> 00:03:51.120
everything at once
00:03:51.120 --> 00:03:53.680
to give you a quick idea of the thing.
03:53.680 --> 00:03:55.200
You have various colors, windows,
00:03:55.200 --> 00:03:55.920
and all those spaces
00:03:55.920 --> 00:03:57.120
have different functions
03:57.120 --> 00:03:58.560
that help the translator,
00:03:58.560 --> 00:03:59.360
and that you’re probably
00:03:59.360 --> 00:04:02.879
not familiar with.
04:02.879 --> 00:04:04.080
I’m going to introduce you
00:04:04.080 --> 00:04:05.680
to the interface now.
04:05.680 --> 00:04:07.519
So first, we have the editor.
04:07.519 --> 00:04:09.439
The editor comes in two parts:
04:09.439 --> 00:04:10.480
the current segment,
00:04:10.480 --> 00:04:12.319
which is associated to a number,
00:04:12.319 --> 00:04:13.519
and all the other segments,
00:04:13.519 --> 00:04:15.840
above or below.
04:15.840 --> 00:04:16.720
At the top of the window,
00:04:16.720 --> 00:04:18.720
you can see the first three segments
00:04:18.720 --> 00:04:20.799
that were in the .po file.
04:20.799 --> 00:04:22.880
The last one here, the fourth one, comes
00:04:22.880 --> 00:04:28.720
with an automatic fuzzy match insertion.
04:28.720 --> 00:04:30.880
Such legacy translations are what we
04:30.880 --> 00:04:32.720
call ‘translation memories’.
04:32.720 --> 00:04:35.280
OmegaT has inserted this one automatically
00:04:35.280 --> 00:04:37.120
because I told it to do so,
04:37.120 --> 00:04:38.560
and for my security, it comes with
00:04:38.560 --> 00:04:40.639
the predefined fuzzy prefix
00:04:40.639 --> 00:04:41.919
that I will have to remove
00:04:41.919 --> 00:04:44.880
to validate the translation.
04:44.880 --> 00:04:47.919
Our next feature is the glossary feature.
04:47.919 --> 00:04:48.479
In this project,
00:04:48.479 --> 00:04:50.160
we have a lot of glossary data.
00:04:50.160 --> 00:04:52.560
Some is relevant and some is not.
04:52.560 --> 00:04:53.919
In the segment that I’m translating
00:04:53.919 --> 00:04:55.199
at the moment, you can see
00:04:55.199 --> 00:04:57.520
underlined items.
04:57.520 --> 00:04:59.040
This pop-up menu on the right
00:04:59.040 --> 00:05:02.240
allows me to enter the terms as I type.
05:02.240 --> 00:05:04.639
It’s kind of an auto insertion system
00:05:04.639 --> 00:05:07.039
that also supports history predictions,
00:05:07.039 --> 00:05:14.479
predefined strings, and things like that.
05:14.479 --> 00:05:15.440
In the part on the right,
00:05:15.440 --> 00:05:17.120
we have reference information
00:05:17.120 --> 00:05:18.240
that comes directly from
00:05:18.240 --> 00:05:21.440
the .po and .texi files.
05:21.440 --> 00:05:23.440
We also have notes that I can share
00:05:23.440 --> 00:05:25.759
with fellow translators,
05:25.759 --> 00:05:28.080
and we have numbers that tell me
00:05:28.080 --> 00:05:31.199
that I still have 143 000 segments more to go
00:05:31.199 --> 00:05:35.280
before I complete this translation.
05:35.280 --> 00:05:37.120
As we see, there are plenty of strings
05:37.120 --> 00:05:40.000
that we really don’t want to have to type.
05:40.000 --> 00:05:42.160
For example, those strings
00:05:42.160 --> 00:05:43.840
are typical .texi strings
00:05:43.840 --> 00:05:45.039
that the translator
00:05:45.039 --> 00:05:46.479
should really not have to type.
00:05:46.479 --> 00:05:47.360
So we’re going to have to
00:05:47.360 --> 00:05:50.400
do something about that.
05:50.400 --> 00:05:51.600
we’re going to have to create
00:05:51.600 --> 00:05:52.479
protected strings
00:05:52.479 --> 00:05:54.400
with regular expressions,
05:54.400 --> 00:05:56.800
so that the strings can be visualized
00:05:56.800 --> 00:05:59.120
right away in the source segment,
05:59.120 --> 00:06:00.479
entered semi-automatically
00:06:00.479 --> 00:06:01.680
in the target segment,
00:06:01.680 --> 00:06:04.479
and checked for integrity.
06:04.479 --> 00:06:06.479
The regular expression I came up with
06:06.479 --> 00:06:08.160
for defining most of the strings
00:06:08.160 --> 00:06:09.600
is this one,
06:09.600 --> 00:06:11.120
and I’m not a regular expression pro
00:06:11.120 --> 00:06:13.360
so I’m sure some of you will correct me.
00:06:13.360 --> 00:06:14.560
But this expression gives me
00:06:14.560 --> 00:06:15.919
a good enough definition
00:06:15.919 --> 00:06:17.919
even though it does not yet include
00:06:17.919 --> 00:06:20.960
Org mode syntax.
06:20.960 --> 00:06:22.344
So now we have all those
00:06:22.344 --> 00:06:23.440
.texi specific things
00:06:23.440 --> 00:06:24.960
that we don’t want to touch
06:24.960 --> 00:06:26.100
displayed in gray.
00:06:26.100 --> 00:06:27.680
Actually, you may have noticed
00:06:27.680 --> 00:06:28.479
that I cheated a bit,
06:28.479 --> 00:06:30.319
because here I added the years
00:06:30.319 --> 00:06:32.000
and the Free Software Foundation name
00:06:32.000 --> 00:06:34.000
to the previous regular expression
00:06:34.000 --> 00:06:35.520
to show you that you can protect
00:06:35.520 --> 00:06:38.560
any kind of string, really.
06:38.560 --> 00:06:39.520
So what we have now
00:06:39.520 --> 00:06:41.360
is a way to visualize the strings
00:06:41.360 --> 00:06:43.440
that we do not want to touch,
06:43.440 --> 00:06:45.440
but we still have to enter all of them
00:06:45.440 --> 00:06:46.880
in the translation.
06:46.880 --> 00:06:48.319
For that, we have the pop-up menu
00:06:48.319 --> 00:06:50.400
that I used earlier with the glossary,
00:06:50.400 --> 00:06:51.520
and we also have items
00:06:51.520 --> 00:06:52.400
in the edit menu
00:06:52.400 --> 00:06:53.919
that come with shortcuts
00:06:53.919 --> 00:06:57.199
for easy insertion of missing tags.
06:57.199 --> 00:06:58.800
Last, but certainly not least,
00:06:58.800 --> 00:07:00.800
we can now validate our input.
00:07:00.800 --> 00:07:02.479
Here, OmegaT properly tells me
00:07:02.479 --> 00:07:05.759
that I missed 7 protected strings,
07:05.759 --> 00:07:07.599
I entered only 1998,
00:07:07.599 --> 00:07:09.280
but there were five different years,
00:07:09.280 --> 00:07:10.479
the copyright string,
00:07:10.479 --> 00:07:14.240
and the FSF name string.
07:14.240 --> 00:07:15.970
With all this almost native
00:07:15.970 --> 00:07:16.960
Texinfo support,
00:07:16.960 --> 00:07:18.880
we have much less things to type,
07:18.880 --> 00:07:19.919
and there is a much lower
00:07:19.919 --> 00:07:21.120
potential for errors.
00:07:21.120 --> 00:07:25.199
But we agree, it’s still a lot of work.
07:25.199 --> 00:07:26.319
What we’d like now
00:07:26.319 --> 00:07:27.840
is to work with fellow translators,
00:07:27.840 --> 00:07:28.720
and here we need to know
00:07:28.720 --> 00:07:29.840
that OmegaT is actually
00:07:29.840 --> 00:07:32.080
a hidden svn/git client,
00:07:32.080 --> 00:07:34.240
and team projects can be hosted
07:34.240 --> 00:07:36.319
on svn/git platforms.
07:36.319 --> 00:07:37.199
Translators don’t need to
00:07:37.199 --> 00:07:38.880
know anything about VCS.
00:07:38.880 --> 00:07:40.720
They just need access credentials,
00:07:40.720 --> 00:07:42.400
and OmegaT commits for them.
00:07:42.400 --> 00:07:44.080
This way we do not have to use
00:07:44.080 --> 00:07:45.759
ugly and clumsy web-based
00:07:45.759 --> 00:07:47.199
translation interfaces,
00:07:47.199 --> 00:07:48.800
and we can use a powerful
00:07:48.800 --> 00:07:51.440
offline professional tool.
07:51.440 --> 00:07:52.479
So this is how it looks
00:07:52.479 --> 00:07:54.160
when you look at the platform
00:07:54.160 --> 00:07:55.919
where I hosted this project.
07:55.919 --> 00:07:57.199
The last updates are from
00:07:57.199 --> 00:07:58.639
20 days and 30 seconds ago
00:07:58.639 --> 00:08:00.720
when I created this slide,
08:00.720 --> 00:08:02.479
and you can see that I had a partner
00:08:02.479 --> 00:08:04.639
who worked with me on the same file set.
08:04.639 --> 00:08:05.520
Although it looks like
00:08:05.520 --> 00:08:06.879
we actually committed the translation
00:08:06.879 --> 00:08:07.680
to the platform,
00:08:07.680 --> 00:08:11.039
it was not us, but OmegaT.
00:08:11.039 --> 00:08:13.599
OmegaT does all the heavy-duty work.
08:13.599 --> 00:08:15.039
It regularly saves to
00:08:15.039 --> 00:08:16.879
and syncs from the servers.
08:16.879 --> 00:08:18.720
Translators are regularly kept updated
08:18.720 --> 00:08:20.479
with work from fellow translators,
00:08:20.479 --> 00:08:21.680
and when necessary,
00:08:21.680 --> 00:08:23.360
OmegaT offers a simple
00:08:23.360 --> 00:08:25.440
conflict-resolution dialogue.
08:25.440 --> 00:08:27.039
Translators never have to do anything
08:27.039 --> 00:08:29.360
with svn or git ever.
08:29.360 --> 00:08:30.800
And now we can envision a future
00:08:30.800 --> 00:08:31.599
not so far away
00:08:31.599 --> 00:08:33.120
where the manuals will be translated
00:08:33.120 --> 00:08:34.159
and eventually included
00:08:34.159 --> 00:08:35.279
in the distribution,
00:08:35.279 --> 00:08:36.080
but that’s a topic
00:08:36.080 --> 00:08:39.760
for a different presentation.
08:39.760 --> 00:08:42.080
So we’ve reached the end of this session.
08:42.080 --> 00:08:44.240
Thank you very much again for joining it.
08:44.240 --> 00:08:45.600
There are plenty of topics
00:08:45.600 --> 00:08:46.880
I promised I would not address,
00:08:46.880 --> 00:08:50.000
and I think I kept my promise.
08:50.000 --> 00:08:51.600
There will be a Q&A now,
00:08:51.600 --> 00:08:52.517
and I also started
00:08:52.517 --> 00:08:53.600
a thread about this talk
00:08:53.600 --> 00:08:55.519
on Reddit last Saturday.
08:55.519 --> 00:08:57.279
You can find me on the emacs-help
00:08:57.279 --> 00:08:59.200
and emacs-devel lists as well,
00:08:59.200 --> 00:09:00.480
so don’t hesitate to send me
00:09:00.480 --> 00:09:02.080
questions and remarks.
09:02.080 --> 09:06.760
Thank you again, and see you around.
|