WEBVTT

00:01.280 --> 00:00:02.560
Hello everybody.

00:02.560 --> 00:00:04.400
My name is Jean-Christophe Helary, 

00:00:04.400 --> 00:00:05.680
and today I’m going to talk about 

00:00:05.680 --> 00:00:08.320
Emacs manuals translation and OmegaT. 

00:00:08.320 --> 00:00:10.960
Thank you for joining the session.

00:10.960 --> 00:00:12.880
Translation in the free software world

00:12.880 --> 00:00:15.040
is really a big thing. You already know

00:15.040 --> 00:00:17.119
that most of the Linux distributions,

00:17.119 --> 00:00:18.720
most of the software packages, 

00:00:18.720 --> 00:00:19.920
most of the websites 

00:00:19.920 --> 00:00:22.320
are translated by dozens of communities

00:00:22.320 --> 00:00:23.439
using different processes

00:23.439 --> 00:00:24.880
and file formats.

00:24.880 --> 00:00:27.359
Translation and localization

00:27.359 --> 00:00:29.599
are things we know very well.

00:29.599 --> 00:00:30.400
It’s a tad different 

00:00:30.400 --> 00:00:32.160
for the Emacs community.

00:32.160 --> 00:00:34.079
We do not have a localization process

00:34.079 --> 00:00:35.200
because it’s quite complex 

00:00:35.200 --> 00:00:35.920
and because we don’t 

00:00:35.920 --> 00:00:37.600
have the resources yet.

00:37.600 --> 00:00:39.920
Still, we could translate the manuals, 

00:00:39.920 --> 00:00:41.200
and translating the manuals

00:00:41.200 --> 00:00:42.399
would probably bring a lot of good 

00:00:42.399 --> 00:00:45.600
to the Emacs community at large.

00:45.600 --> 00:00:47.920
So what’s the state of the manuals?

00:47.920 --> 00:00:51.199
As of today, we have 182 files

00:51.199 --> 00:00:54.160
coming in .texi and .org format.

00:54.160 --> 00:00:56.559
We’ve got more than 2 million words.

00:56.559 --> 00:00:57.360
We’ve got more than 

00:00:57.360 --> 00:00:59.039
50 million characters.

00:00:59.039 --> 00:01:00.559
So that’s quite a lot of work,

01:00.559 --> 00:01:04.559
and obviously, it’s not a one person job.

01:04.559 --> 00:01:06.159
When we open .texi files, 

00:01:06.159 --> 00:01:07.760
what do we have?

01:07.760 --> 00:01:09.439
Well, we actually have a lot of things

01:09.439 --> 00:01:10.560
that the translators 

00:01:10.560 --> 00:01:12.400
shouldn’t have to translate.

01:12.400 --> 00:01:13.680
Here we can see that only 

00:01:13.680 --> 00:01:15.040
the very last segment,

00:01:15.040 --> 00:01:16.400
the very last sentence 

00:01:16.400 --> 00:01:18.080
should be translated.

01:18.080 --> 00:01:19.360
All those meta things 

00:01:19.360 --> 00:01:20.240
should not be under 

00:01:20.240 --> 00:01:24.479
the translator’s eyes.

01:24.479 --> 00:01:26.720
How do we deal with this situation?

01:26.720 --> 00:01:27.680
For code files, we have 

00:01:27.680 --> 00:01:29.360
the gettext utility that converts 

00:01:29.360 --> 00:01:30.640
all the translatable strings

00:01:30.640 --> 00:01:32.079
into a translatable format, 

00:01:32.079 --> 00:01:33.840
which is the .po format.

01:33.840 --> 00:01:35.520
And that .po format is ubiquitous, 

00:01:35.520 --> 00:01:36.400
even in the non-free 

00:01:36.400 --> 00:01:38.720
software translation industry.

01:38.720 --> 00:01:39.520
For documentation, 

00:01:39.520 --> 00:01:40.720
we have something different.

00:01:40.720 --> 00:01:42.000
It’s called po4a,

00:01:42.000 --> 00:01:45.119
which is short for ‘po for all’.

01:45.119 --> 00:01:46.399
When we use po4a 

00:01:46.399 --> 00:01:49.200
on those 182 .texi and .org files,

00:01:49.200 --> 00:01:50.479
what do we get?

01:50.479 --> 00:01:52.640
We get something that’s much better.

01:52.640 --> 00:01:54.799
Now we have three segments.

01:54.799 --> 00:01:55.759
It’s not perfect because, 

00:01:55.759 --> 00:01:56.399
as you can see, 

00:01:56.399 --> 00:01:57.280
the two first segments

00:01:57.280 --> 00:01:58.880
should not be translated.

01:58.880 --> 00:01:59.520
So there’s still 

00:01:59.520 --> 00:02:02.479
room for improvement.

02:02.479 --> 00:02:04.960
Now, when we put that file set

00:02:04.960 --> 00:02:07.119
into OmegaT, we considerably reduce

00:02:07.119 --> 00:02:08.800
the words total.

02:08.800 --> 00:02:11.360
We now have 50% fewer words

00:02:11.360 --> 00:02:14.239
and 23% fewer characters to type,

02:14.239 --> 00:02:15.680
but that’s still a lot of work.

00:02:15.680 --> 00:02:17.599
So let’s talk about OmegaT now 

00:02:17.599 --> 00:02:22.239
and see where it can help.

02:22.239 --> 00:02:25.440
OmegaT is a GPL3+ Java8+

02:25.440 --> 00:02:27.599
Computer Aided Translation tool.

02:27.599 --> 00:02:29.440
We call them CATs.

02:29.440 --> 00:02:30.720
CATs are to translators 

00:02:30.720 --> 00:02:33.280
what IDEs are to programmers.

02:33.280 --> 00:02:35.040
They leverage the power of computers 

00:02:35.040 --> 00:02:36.480
to automate our work, 

00:02:36.480 --> 00:02:38.400
which is, reference searches,

00:02:38.400 --> 00:02:40.800
fuzzy matching, automatic insertions,

00:02:40.800 --> 00:02:44.080
and things like that.

02:44.080 --> 00:02:46.319
OmegaT is not really recent.

02:46.319 --> 00:02:48.319
It will turn 20 next year,

02:48.319 --> 00:02:48.959
and at this point, 

00:02:48.959 --> 00:02:51.440
we have about 1.5 million downloads

00:02:51.440 --> 00:02:53.200
from the SourceForge site,

00:02:53.200 --> 00:02:54.080
which doesn’t mean much 

00:02:54.080 --> 00:02:55.040
because that includes 

00:02:55.040 --> 00:02:56.480
files used for localization

00:02:56.480 --> 00:02:57.920
and manuals, but still 

00:02:57.920 --> 00:02:59.599
it’s a pretty big number.

02:59.599 --> 00:03:00.720
OmegaT is included in 

00:03:00.720 --> 00:03:02.400
a lot of Linux distributions,

00:03:02.400 --> 00:03:03.680
but as you can see here,

03:03.680 --> 00:03:05.920
it’s mostly downloaded on Windows systems

00:03:05.920 --> 00:03:06.800
because translators 

00:03:06.800 --> 00:03:09.680
mostly work on Windows.

03:09.680 --> 00:03:11.120
OmegaT comes with a cool logo

00:03:11.120 --> 00:03:12.080
and a cool site too, 

00:03:12.080 --> 00:03:13.920
and I really invite you to visit it.

00:03:13.920 --> 00:03:16.159
It’s omegat.org, and you’ll see

03:16.159 --> 00:03:17.280
all the information you need, 

00:03:17.280 --> 00:03:19.040
plus downloads to Linux versions,

00:03:19.040 --> 00:03:22.080
with or without Java included.

03:22.080 --> 00:03:24.799
So what does OmegaT bring to the game?

03:24.799 --> 00:03:26.560
Professional translators have to deliver

03:26.560 --> 00:03:27.680
fast, consistent, 

00:03:27.680 --> 00:03:29.519
and quality translations,

03:29.519 --> 00:03:30.720
and we need to have proper tools

00:03:30.720 --> 00:03:32.159
to achieve that. 

00:03:32.159 --> 00:03:34.239
I wish po-mode was part of the toolbox,

00:03:34.239 --> 00:03:35.120
but that’s not the case,

03:35.120 --> 00:03:36.560
and it’s a pity.

03:36.560 --> 00:03:39.760
So we have to use those CAT tools.

03:39.760 --> 00:03:41.440
Let me show you what OmegaT looks like

03:41.440 --> 00:03:43.120
when I open this project that I created

03:43.120 --> 00:03:45.200
for this demonstration.

03:45.200 --> 00:03:46.640
The display is quite a mouthful,

00:03:46.640 --> 00:03:47.760
but you can actually modify 

00:03:47.760 --> 00:03:49.519
all windows as needed.

03:49.519 --> 00:03:50.400
I just want to show you 

00:03:50.400 --> 00:03:51.120
everything at once

00:03:51.120 --> 00:03:53.680
to give you a quick idea of the thing.

03:53.680 --> 00:03:55.200
You have various colors, windows,

00:03:55.200 --> 00:03:55.920
and all those spaces 

00:03:55.920 --> 00:03:57.120
have different functions

03:57.120 --> 00:03:58.560
that help the translator, 

00:03:58.560 --> 00:03:59.360
and that you’re probably

00:03:59.360 --> 00:04:02.879
not familiar with.

04:02.879 --> 00:04:04.080
I’m going to introduce you 

00:04:04.080 --> 00:04:05.680
to the interface now.

04:05.680 --> 00:04:07.519
So first, we have the editor.

04:07.519 --> 00:04:09.439
The editor comes in two parts:

04:09.439 --> 00:04:10.480
the current segment, 

00:04:10.480 --> 00:04:12.319
which is associated to a number,

00:04:12.319 --> 00:04:13.519
and all the other segments, 

00:04:13.519 --> 00:04:15.840
above or below.

04:15.840 --> 00:04:16.720
At the top of the window,

00:04:16.720 --> 00:04:18.720
you can see the first three segments

00:04:18.720 --> 00:04:20.799
that were in the .po file.

04:20.799 --> 00:04:22.880
The last one here, the fourth one, comes 

00:04:22.880 --> 00:04:28.720
with an automatic fuzzy match insertion.

04:28.720 --> 00:04:30.880
Such legacy translations are what we

04:30.880 --> 00:04:32.720
call ‘translation memories’.

04:32.720 --> 00:04:35.280
OmegaT has inserted this one automatically

00:04:35.280 --> 00:04:37.120
because I told it to do so,

04:37.120 --> 00:04:38.560
and for my security, it comes with

00:04:38.560 --> 00:04:40.639
the predefined fuzzy prefix

00:04:40.639 --> 00:04:41.919
that I will have to remove

00:04:41.919 --> 00:04:44.880
to validate the translation.

04:44.880 --> 00:04:47.919
Our next feature is the glossary feature.

04:47.919 --> 00:04:48.479
In this project, 

00:04:48.479 --> 00:04:50.160
we have a lot of glossary data. 

00:04:50.160 --> 00:04:52.560
Some is relevant and some is not.

04:52.560 --> 00:04:53.919
In the segment that I’m translating

00:04:53.919 --> 00:04:55.199
at the moment, you can see

00:04:55.199 --> 00:04:57.520
underlined items.

04:57.520 --> 00:04:59.040
This pop-up menu on the right

00:04:59.040 --> 00:05:02.240
allows me to enter the terms as I type.

05:02.240 --> 00:05:04.639
It’s kind of an auto insertion system

00:05:04.639 --> 00:05:07.039
that also supports history predictions,

00:05:07.039 --> 00:05:14.479
predefined strings, and things like that.

05:14.479 --> 00:05:15.440
In the part on the right,

00:05:15.440 --> 00:05:17.120
we have reference information 

00:05:17.120 --> 00:05:18.240
that comes directly from 

00:05:18.240 --> 00:05:21.440
the .po and .texi files.

05:21.440 --> 00:05:23.440
We also have notes that I can share 

00:05:23.440 --> 00:05:25.759
with fellow translators,

05:25.759 --> 00:05:28.080
and we have numbers that tell me

00:05:28.080 --> 00:05:31.199
that I still have 143 000 segments more to go

00:05:31.199 --> 00:05:35.280
before I complete this translation.

05:35.280 --> 00:05:37.120
As we see, there are plenty of strings

05:37.120 --> 00:05:40.000
that we really don’t want to have to type.

05:40.000 --> 00:05:42.160
For example, those strings 

00:05:42.160 --> 00:05:43.840
are typical .texi strings 

00:05:43.840 --> 00:05:45.039
that the translator 

00:05:45.039 --> 00:05:46.479
should really not have to type. 

00:05:46.479 --> 00:05:47.360
So we’re going to have to 

00:05:47.360 --> 00:05:50.400
do something about that.

05:50.400 --> 00:05:51.600
we’re going to have to create 

00:05:51.600 --> 00:05:52.479
protected strings 

00:05:52.479 --> 00:05:54.400
with regular expressions,

05:54.400 --> 00:05:56.800
so that the strings can be visualized

00:05:56.800 --> 00:05:59.120
right away in the source segment,

05:59.120 --> 00:06:00.479
entered semi-automatically 

00:06:00.479 --> 00:06:01.680
in the target segment,

00:06:01.680 --> 00:06:04.479
and checked for integrity.

06:04.479 --> 00:06:06.479
The regular expression I came up with

06:06.479 --> 00:06:08.160
for defining most of the strings 

00:06:08.160 --> 00:06:09.600
is this one,

06:09.600 --> 00:06:11.120
and I’m not a regular expression pro

00:06:11.120 --> 00:06:13.360
so I’m sure some of you will correct me. 

00:06:13.360 --> 00:06:14.560
But this expression gives me 

00:06:14.560 --> 00:06:15.919
a good enough definition 

00:06:15.919 --> 00:06:17.919
even though it does not yet include

00:06:17.919 --> 00:06:20.960
Org mode syntax.

06:20.960 --> 00:06:22.344
So now we have all those 

00:06:22.344 --> 00:06:23.440
.texi specific things 

00:06:23.440 --> 00:06:24.960
that we don’t want to touch

06:24.960 --> 00:06:26.100
displayed in gray. 

00:06:26.100 --> 00:06:27.680
Actually, you may have noticed 

00:06:27.680 --> 00:06:28.479
that I cheated a bit,

06:28.479 --> 00:06:30.319
because here I added the years

00:06:30.319 --> 00:06:32.000
and the Free Software Foundation name

00:06:32.000 --> 00:06:34.000
to the previous regular expression

00:06:34.000 --> 00:06:35.520
to show you that you can protect

00:06:35.520 --> 00:06:38.560
any kind of string, really.

06:38.560 --> 00:06:39.520
So what we have now 

00:06:39.520 --> 00:06:41.360
is a way to visualize the strings

00:06:41.360 --> 00:06:43.440
that we do not want to touch,

06:43.440 --> 00:06:45.440
but we still have to enter all of them

00:06:45.440 --> 00:06:46.880
in the translation.

06:46.880 --> 00:06:48.319
For that, we have the pop-up menu

00:06:48.319 --> 00:06:50.400
that I used earlier with the glossary,

00:06:50.400 --> 00:06:51.520
and we also have items 

00:06:51.520 --> 00:06:52.400
in the edit menu 

00:06:52.400 --> 00:06:53.919
that come with shortcuts 

00:06:53.919 --> 00:06:57.199
for easy insertion of missing tags.

06:57.199 --> 00:06:58.800
Last, but certainly not least,

00:06:58.800 --> 00:07:00.800
we can now validate our input.

00:07:00.800 --> 00:07:02.479
Here, OmegaT properly tells me

00:07:02.479 --> 00:07:05.759
that I missed 7 protected strings,

07:05.759 --> 00:07:07.599
I entered only 1998,

00:07:07.599 --> 00:07:09.280
but there were five different years,

00:07:09.280 --> 00:07:10.479
the copyright string,

00:07:10.479 --> 00:07:14.240
and the FSF name string.

07:14.240 --> 00:07:15.970
With all this almost native 

00:07:15.970 --> 00:07:16.960
Texinfo support,

00:07:16.960 --> 00:07:18.880
we have much less things to type,

07:18.880 --> 00:07:19.919
and there is a much lower 

00:07:19.919 --> 00:07:21.120
potential for errors. 

00:07:21.120 --> 00:07:25.199
But we agree, it’s still a lot of work.

07:25.199 --> 00:07:26.319
What we’d like now

00:07:26.319 --> 00:07:27.840
is to work with fellow translators,

00:07:27.840 --> 00:07:28.720
and here we need to know 

00:07:28.720 --> 00:07:29.840
that OmegaT is actually 

00:07:29.840 --> 00:07:32.080
a hidden svn/git client,

00:07:32.080 --> 00:07:34.240
and team projects can be hosted

07:34.240 --> 00:07:36.319
on svn/git platforms.

07:36.319 --> 00:07:37.199
Translators don’t need to 

00:07:37.199 --> 00:07:38.880
know anything about VCS. 

00:07:38.880 --> 00:07:40.720
They just need access credentials, 

00:07:40.720 --> 00:07:42.400
and OmegaT commits for them.

00:07:42.400 --> 00:07:44.080
This way we do not have to use 

00:07:44.080 --> 00:07:45.759
ugly and clumsy web-based 

00:07:45.759 --> 00:07:47.199
translation interfaces,

00:07:47.199 --> 00:07:48.800
and we can use a powerful 

00:07:48.800 --> 00:07:51.440
offline professional tool.

07:51.440 --> 00:07:52.479
So this is how it looks

00:07:52.479 --> 00:07:54.160
when you look at the platform

00:07:54.160 --> 00:07:55.919
where I hosted this project.

07:55.919 --> 00:07:57.199
The last updates are from

00:07:57.199 --> 00:07:58.639
20 days and 30 seconds ago

00:07:58.639 --> 00:08:00.720
when I created this slide,

08:00.720 --> 00:08:02.479
and you can see that I had a partner

00:08:02.479 --> 00:08:04.639
who worked with me on the same file set.

08:04.639 --> 00:08:05.520
Although it looks like

00:08:05.520 --> 00:08:06.879
we actually committed the translation 

00:08:06.879 --> 00:08:07.680
to the platform,

00:08:07.680 --> 00:08:11.039
it was not us, but OmegaT.

00:08:11.039 --> 00:08:13.599
OmegaT does all the heavy-duty work.

08:13.599 --> 00:08:15.039
It regularly saves to 

00:08:15.039 --> 00:08:16.879
and syncs from the servers.

08:16.879 --> 00:08:18.720
Translators are regularly kept updated

08:18.720 --> 00:08:20.479
with work from fellow translators, 

00:08:20.479 --> 00:08:21.680
and when necessary, 

00:08:21.680 --> 00:08:23.360
OmegaT offers a simple 

00:08:23.360 --> 00:08:25.440
conflict-resolution dialogue.

08:25.440 --> 00:08:27.039
Translators never have to do anything

08:27.039 --> 00:08:29.360
with svn or git ever.

08:29.360 --> 00:08:30.800
And now we can envision a future

00:08:30.800 --> 00:08:31.599
not so far away

00:08:31.599 --> 00:08:33.120
where the manuals will be translated

00:08:33.120 --> 00:08:34.159
and eventually included 

00:08:34.159 --> 00:08:35.279
in the distribution, 

00:08:35.279 --> 00:08:36.080
but that’s a topic 

00:08:36.080 --> 00:08:39.760
for a different presentation.

08:39.760 --> 00:08:42.080
So we’ve reached the end of this session.

08:42.080 --> 00:08:44.240
Thank you very much again for joining it.

08:44.240 --> 00:08:45.600
There are plenty of topics 

00:08:45.600 --> 00:08:46.880
I promised I would not address,

00:08:46.880 --> 00:08:50.000
and I think I kept my promise.

08:50.000 --> 00:08:51.600
There will be a Q&A now,

00:08:51.600 --> 00:08:52.517
and I also started 

00:08:52.517 --> 00:08:53.600
a thread about this talk 

00:08:53.600 --> 00:08:55.519
on Reddit last Saturday.

08:55.519 --> 00:08:57.279
You can find me on the emacs-help 

00:08:57.279 --> 00:08:59.200
and emacs-devel lists as well,

00:08:59.200 --> 00:09:00.480
so don’t hesitate to send me 

00:09:00.480 --> 00:09:02.080
questions and remarks.

09:02.080 --> 09:06.760
Thank you again, and see you around.