summaryrefslogtreecommitdiffstats
path: root/2021/captions/emacsconf-2021-omegat--emacs-manuals-translation-and-omegat--jean-christophe-helary--main.vtt
diff options
context:
space:
mode:
Diffstat (limited to '2021/captions/emacsconf-2021-omegat--emacs-manuals-translation-and-omegat--jean-christophe-helary--main.vtt')
-rw-r--r--2021/captions/emacsconf-2021-omegat--emacs-manuals-translation-and-omegat--jean-christophe-helary--main.vtt892
1 files changed, 892 insertions, 0 deletions
diff --git a/2021/captions/emacsconf-2021-omegat--emacs-manuals-translation-and-omegat--jean-christophe-helary--main.vtt b/2021/captions/emacsconf-2021-omegat--emacs-manuals-translation-and-omegat--jean-christophe-helary--main.vtt
new file mode 100644
index 00000000..34cfbcad
--- /dev/null
+++ b/2021/captions/emacsconf-2021-omegat--emacs-manuals-translation-and-omegat--jean-christophe-helary--main.vtt
@@ -0,0 +1,892 @@
+WEBVTT
+
+00:01.280 --> 00:00:02.560
+Hello everybody.
+
+00:02.560 --> 00:00:04.400
+My name is Jean-Christophe Helary,
+
+00:00:04.400 --> 00:00:05.680
+and today I’m going to talk about
+
+00:00:05.680 --> 00:00:08.320
+Emacs manuals translation and OmegaT.
+
+00:00:08.320 --> 00:00:10.960
+Thank you for joining the session.
+
+00:10.960 --> 00:00:12.880
+Translation in the free software world
+
+00:12.880 --> 00:00:15.040
+is really a big thing. You already know
+
+00:15.040 --> 00:00:17.119
+that most of the Linux distributions,
+
+00:17.119 --> 00:00:18.720
+most of the software packages,
+
+00:00:18.720 --> 00:00:19.920
+most of the websites
+
+00:00:19.920 --> 00:00:22.320
+are translated by dozens of communities
+
+00:00:22.320 --> 00:00:23.439
+using different processes
+
+00:23.439 --> 00:00:24.880
+and file formats.
+
+00:24.880 --> 00:00:27.359
+Translation and localization
+
+00:27.359 --> 00:00:29.599
+are things we know very well.
+
+00:29.599 --> 00:00:30.400
+It’s a tad different
+
+00:00:30.400 --> 00:00:32.160
+for the Emacs community.
+
+00:32.160 --> 00:00:34.079
+We do not have a localization process
+
+00:34.079 --> 00:00:35.200
+because it’s quite complex
+
+00:00:35.200 --> 00:00:35.920
+and because we don’t
+
+00:00:35.920 --> 00:00:37.600
+have the resources yet.
+
+00:37.600 --> 00:00:39.920
+Still, we could translate the manuals,
+
+00:00:39.920 --> 00:00:41.200
+and translating the manuals
+
+00:00:41.200 --> 00:00:42.399
+would probably bring a lot of good
+
+00:00:42.399 --> 00:00:45.600
+to the Emacs community at large.
+
+00:45.600 --> 00:00:47.920
+So what’s the state of the manuals?
+
+00:47.920 --> 00:00:51.199
+As of today, we have 182 files
+
+00:51.199 --> 00:00:54.160
+coming in .texi and .org format.
+
+00:54.160 --> 00:00:56.559
+We’ve got more than 2 million words.
+
+00:56.559 --> 00:00:57.360
+We’ve got more than
+
+00:00:57.360 --> 00:00:59.039
+50 million characters.
+
+00:00:59.039 --> 00:01:00.559
+So that’s quite a lot of work,
+
+01:00.559 --> 00:01:04.559
+and obviously, it’s not a one person job.
+
+01:04.559 --> 00:01:06.159
+When we open .texi files,
+
+00:01:06.159 --> 00:01:07.760
+what do we have?
+
+01:07.760 --> 00:01:09.439
+Well, we actually have a lot of things
+
+01:09.439 --> 00:01:10.560
+that the translators
+
+00:01:10.560 --> 00:01:12.400
+shouldn’t have to translate.
+
+01:12.400 --> 00:01:13.680
+Here we can see that only
+
+00:01:13.680 --> 00:01:15.040
+the very last segment,
+
+00:01:15.040 --> 00:01:16.400
+the very last sentence
+
+00:01:16.400 --> 00:01:18.080
+should be translated.
+
+01:18.080 --> 00:01:19.360
+All those meta things
+
+00:01:19.360 --> 00:01:20.240
+should not be under
+
+00:01:20.240 --> 00:01:24.479
+the translator’s eyes.
+
+01:24.479 --> 00:01:26.720
+How do we deal with this situation?
+
+01:26.720 --> 00:01:27.680
+For code files, we have
+
+00:01:27.680 --> 00:01:29.360
+the gettext utility that converts
+
+00:01:29.360 --> 00:01:30.640
+all the translatable strings
+
+00:01:30.640 --> 00:01:32.079
+into a translatable format,
+
+00:01:32.079 --> 00:01:33.840
+which is the .po format.
+
+01:33.840 --> 00:01:35.520
+And that .po format is ubiquitous,
+
+00:01:35.520 --> 00:01:36.400
+even in the non-free
+
+00:01:36.400 --> 00:01:38.720
+software translation industry.
+
+01:38.720 --> 00:01:39.520
+For documentation,
+
+00:01:39.520 --> 00:01:40.720
+we have something different.
+
+00:01:40.720 --> 00:01:42.000
+It’s called po4a,
+
+00:01:42.000 --> 00:01:45.119
+which is short for ‘po for all’.
+
+01:45.119 --> 00:01:46.399
+When we use po4a
+
+00:01:46.399 --> 00:01:49.200
+on those 182 .texi and .org files,
+
+00:01:49.200 --> 00:01:50.479
+what do we get?
+
+01:50.479 --> 00:01:52.640
+We get something that’s much better.
+
+01:52.640 --> 00:01:54.799
+Now we have three segments.
+
+01:54.799 --> 00:01:55.759
+It’s not perfect because,
+
+00:01:55.759 --> 00:01:56.399
+as you can see,
+
+00:01:56.399 --> 00:01:57.280
+the two first segments
+
+00:01:57.280 --> 00:01:58.880
+should not be translated.
+
+01:58.880 --> 00:01:59.520
+So there’s still
+
+00:01:59.520 --> 00:02:02.479
+room for improvement.
+
+02:02.479 --> 00:02:04.960
+Now, when we put that file set
+
+00:02:04.960 --> 00:02:07.119
+into OmegaT, we considerably reduce
+
+00:02:07.119 --> 00:02:08.800
+the words total.
+
+02:08.800 --> 00:02:11.360
+We now have 50% fewer words
+
+00:02:11.360 --> 00:02:14.239
+and 23% fewer characters to type,
+
+02:14.239 --> 00:02:15.680
+but that’s still a lot of work.
+
+00:02:15.680 --> 00:02:17.599
+So let’s talk about OmegaT now
+
+00:02:17.599 --> 00:02:22.239
+and see where it can help.
+
+02:22.239 --> 00:02:25.440
+OmegaT is a GPL3+ Java8+
+
+02:25.440 --> 00:02:27.599
+Computer Aided Translation tool.
+
+02:27.599 --> 00:02:29.440
+We call them CATs.
+
+02:29.440 --> 00:02:30.720
+CATs are to translators
+
+00:02:30.720 --> 00:02:33.280
+what IDEs are to programmers.
+
+02:33.280 --> 00:02:35.040
+They leverage the power of computers
+
+00:02:35.040 --> 00:02:36.480
+to automate our work,
+
+00:02:36.480 --> 00:02:38.400
+which is, reference searches,
+
+00:02:38.400 --> 00:02:40.800
+fuzzy matching, automatic insertions,
+
+00:02:40.800 --> 00:02:44.080
+and things like that.
+
+02:44.080 --> 00:02:46.319
+OmegaT is not really recent.
+
+02:46.319 --> 00:02:48.319
+It will turn 20 next year,
+
+02:48.319 --> 00:02:48.959
+and at this point,
+
+00:02:48.959 --> 00:02:51.440
+we have about 1.5 million downloads
+
+00:02:51.440 --> 00:02:53.200
+from the SourceForge site,
+
+00:02:53.200 --> 00:02:54.080
+which doesn’t mean much
+
+00:02:54.080 --> 00:02:55.040
+because that includes
+
+00:02:55.040 --> 00:02:56.480
+files used for localization
+
+00:02:56.480 --> 00:02:57.920
+and manuals, but still
+
+00:02:57.920 --> 00:02:59.599
+it’s a pretty big number.
+
+02:59.599 --> 00:03:00.720
+OmegaT is included in
+
+00:03:00.720 --> 00:03:02.400
+a lot of Linux distributions,
+
+00:03:02.400 --> 00:03:03.680
+but as you can see here,
+
+03:03.680 --> 00:03:05.920
+it’s mostly downloaded on Windows systems
+
+00:03:05.920 --> 00:03:06.800
+because translators
+
+00:03:06.800 --> 00:03:09.680
+mostly work on Windows.
+
+03:09.680 --> 00:03:11.120
+OmegaT comes with a cool logo
+
+00:03:11.120 --> 00:03:12.080
+and a cool site too,
+
+00:03:12.080 --> 00:03:13.920
+and I really invite you to visit it.
+
+00:03:13.920 --> 00:03:16.159
+It’s omegat.org, and you’ll see
+
+03:16.159 --> 00:03:17.280
+all the information you need,
+
+00:03:17.280 --> 00:03:19.040
+plus downloads to Linux versions,
+
+00:03:19.040 --> 00:03:22.080
+with or without Java included.
+
+03:22.080 --> 00:03:24.799
+So what does OmegaT bring to the game?
+
+03:24.799 --> 00:03:26.560
+Professional translators have to deliver
+
+03:26.560 --> 00:03:27.680
+fast, consistent,
+
+00:03:27.680 --> 00:03:29.519
+and quality translations,
+
+03:29.519 --> 00:03:30.720
+and we need to have proper tools
+
+00:03:30.720 --> 00:03:32.159
+to achieve that.
+
+00:03:32.159 --> 00:03:34.239
+I wish po-mode was part of the toolbox,
+
+00:03:34.239 --> 00:03:35.120
+but that’s not the case,
+
+03:35.120 --> 00:03:36.560
+and it’s a pity.
+
+03:36.560 --> 00:03:39.760
+So we have to use those CAT tools.
+
+03:39.760 --> 00:03:41.440
+Let me show you what OmegaT looks like
+
+03:41.440 --> 00:03:43.120
+when I open this project that I created
+
+03:43.120 --> 00:03:45.200
+for this demonstration.
+
+03:45.200 --> 00:03:46.640
+The display is quite a mouthful,
+
+00:03:46.640 --> 00:03:47.760
+but you can actually modify
+
+00:03:47.760 --> 00:03:49.519
+all windows as needed.
+
+03:49.519 --> 00:03:50.400
+I just want to show you
+
+00:03:50.400 --> 00:03:51.120
+everything at once
+
+00:03:51.120 --> 00:03:53.680
+to give you a quick idea of the thing.
+
+03:53.680 --> 00:03:55.200
+You have various colors, windows,
+
+00:03:55.200 --> 00:03:55.920
+and all those spaces
+
+00:03:55.920 --> 00:03:57.120
+have different functions
+
+03:57.120 --> 00:03:58.560
+that help the translator,
+
+00:03:58.560 --> 00:03:59.360
+and that you’re probably
+
+00:03:59.360 --> 00:04:02.879
+not familiar with.
+
+04:02.879 --> 00:04:04.080
+I’m going to introduce you
+
+00:04:04.080 --> 00:04:05.680
+to the interface now.
+
+04:05.680 --> 00:04:07.519
+So first, we have the editor.
+
+04:07.519 --> 00:04:09.439
+The editor comes in two parts:
+
+04:09.439 --> 00:04:10.480
+the current segment,
+
+00:04:10.480 --> 00:04:12.319
+which is associated to a number,
+
+00:04:12.319 --> 00:04:13.519
+and all the other segments,
+
+00:04:13.519 --> 00:04:15.840
+above or below.
+
+04:15.840 --> 00:04:16.720
+At the top of the window,
+
+00:04:16.720 --> 00:04:18.720
+you can see the first three segments
+
+00:04:18.720 --> 00:04:20.799
+that were in the .po file.
+
+04:20.799 --> 00:04:22.880
+The last one here, the fourth one, comes
+
+00:04:22.880 --> 00:04:28.720
+with an automatic fuzzy match insertion.
+
+04:28.720 --> 00:04:30.880
+Such legacy translations are what we
+
+04:30.880 --> 00:04:32.720
+call ‘translation memories’.
+
+04:32.720 --> 00:04:35.280
+OmegaT has inserted this one automatically
+
+00:04:35.280 --> 00:04:37.120
+because I told it to do so,
+
+04:37.120 --> 00:04:38.560
+and for my security, it comes with
+
+00:04:38.560 --> 00:04:40.639
+the predefined fuzzy prefix
+
+00:04:40.639 --> 00:04:41.919
+that I will have to remove
+
+00:04:41.919 --> 00:04:44.880
+to validate the translation.
+
+04:44.880 --> 00:04:47.919
+Our next feature is the glossary feature.
+
+04:47.919 --> 00:04:48.479
+In this project,
+
+00:04:48.479 --> 00:04:50.160
+we have a lot of glossary data.
+
+00:04:50.160 --> 00:04:52.560
+Some is relevant and some is not.
+
+04:52.560 --> 00:04:53.919
+In the segment that I’m translating
+
+00:04:53.919 --> 00:04:55.199
+at the moment, you can see
+
+00:04:55.199 --> 00:04:57.520
+underlined items.
+
+04:57.520 --> 00:04:59.040
+This pop-up menu on the right
+
+00:04:59.040 --> 00:05:02.240
+allows me to enter the terms as I type.
+
+05:02.240 --> 00:05:04.639
+It’s kind of an auto insertion system
+
+00:05:04.639 --> 00:05:07.039
+that also supports history predictions,
+
+00:05:07.039 --> 00:05:14.479
+predefined strings, and things like that.
+
+05:14.479 --> 00:05:15.440
+In the part on the right,
+
+00:05:15.440 --> 00:05:17.120
+we have reference information
+
+00:05:17.120 --> 00:05:18.240
+that comes directly from
+
+00:05:18.240 --> 00:05:21.440
+the .po and .texi files.
+
+05:21.440 --> 00:05:23.440
+We also have notes that I can share
+
+00:05:23.440 --> 00:05:25.759
+with fellow translators,
+
+05:25.759 --> 00:05:28.080
+and we have numbers that tell me
+
+00:05:28.080 --> 00:05:31.199
+that I still have 143 000 segments more to go
+
+00:05:31.199 --> 00:05:35.280
+before I complete this translation.
+
+05:35.280 --> 00:05:37.120
+As we see, there are plenty of strings
+
+05:37.120 --> 00:05:40.000
+that we really don’t want to have to type.
+
+05:40.000 --> 00:05:42.160
+For example, those strings
+
+00:05:42.160 --> 00:05:43.840
+are typical .texi strings
+
+00:05:43.840 --> 00:05:45.039
+that the translator
+
+00:05:45.039 --> 00:05:46.479
+should really not have to type.
+
+00:05:46.479 --> 00:05:47.360
+So we’re going to have to
+
+00:05:47.360 --> 00:05:50.400
+do something about that.
+
+05:50.400 --> 00:05:51.600
+we’re going to have to create
+
+00:05:51.600 --> 00:05:52.479
+protected strings
+
+00:05:52.479 --> 00:05:54.400
+with regular expressions,
+
+05:54.400 --> 00:05:56.800
+so that the strings can be visualized
+
+00:05:56.800 --> 00:05:59.120
+right away in the source segment,
+
+05:59.120 --> 00:06:00.479
+entered semi-automatically
+
+00:06:00.479 --> 00:06:01.680
+in the target segment,
+
+00:06:01.680 --> 00:06:04.479
+and checked for integrity.
+
+06:04.479 --> 00:06:06.479
+The regular expression I came up with
+
+06:06.479 --> 00:06:08.160
+for defining most of the strings
+
+00:06:08.160 --> 00:06:09.600
+is this one,
+
+06:09.600 --> 00:06:11.120
+and I’m not a regular expression pro
+
+00:06:11.120 --> 00:06:13.360
+so I’m sure some of you will correct me.
+
+00:06:13.360 --> 00:06:14.560
+But this expression gives me
+
+00:06:14.560 --> 00:06:15.919
+a good enough definition
+
+00:06:15.919 --> 00:06:17.919
+even though it does not yet include
+
+00:06:17.919 --> 00:06:20.960
+Org mode syntax.
+
+06:20.960 --> 00:06:22.344
+So now we have all those
+
+00:06:22.344 --> 00:06:23.440
+.texi specific things
+
+00:06:23.440 --> 00:06:24.960
+that we don’t want to touch
+
+06:24.960 --> 00:06:26.100
+displayed in gray.
+
+00:06:26.100 --> 00:06:27.680
+Actually, you may have noticed
+
+00:06:27.680 --> 00:06:28.479
+that I cheated a bit,
+
+06:28.479 --> 00:06:30.319
+because here I added the years
+
+00:06:30.319 --> 00:06:32.000
+and the Free Software Foundation name
+
+00:06:32.000 --> 00:06:34.000
+to the previous regular expression
+
+00:06:34.000 --> 00:06:35.520
+to show you that you can protect
+
+00:06:35.520 --> 00:06:38.560
+any kind of string, really.
+
+06:38.560 --> 00:06:39.520
+So what we have now
+
+00:06:39.520 --> 00:06:41.360
+is a way to visualize the strings
+
+00:06:41.360 --> 00:06:43.440
+that we do not want to touch,
+
+06:43.440 --> 00:06:45.440
+but we still have to enter all of them
+
+00:06:45.440 --> 00:06:46.880
+in the translation.
+
+06:46.880 --> 00:06:48.319
+For that, we have the pop-up menu
+
+00:06:48.319 --> 00:06:50.400
+that I used earlier with the glossary,
+
+00:06:50.400 --> 00:06:51.520
+and we also have items
+
+00:06:51.520 --> 00:06:52.400
+in the edit menu
+
+00:06:52.400 --> 00:06:53.919
+that come with shortcuts
+
+00:06:53.919 --> 00:06:57.199
+for easy insertion of missing tags.
+
+06:57.199 --> 00:06:58.800
+Last, but certainly not least,
+
+00:06:58.800 --> 00:07:00.800
+we can now validate our input.
+
+00:07:00.800 --> 00:07:02.479
+Here, OmegaT properly tells me
+
+00:07:02.479 --> 00:07:05.759
+that I missed 7 protected strings,
+
+07:05.759 --> 00:07:07.599
+I entered only 1998,
+
+00:07:07.599 --> 00:07:09.280
+but there were five different years,
+
+00:07:09.280 --> 00:07:10.479
+the copyright string,
+
+00:07:10.479 --> 00:07:14.240
+and the FSF name string.
+
+07:14.240 --> 00:07:15.970
+With all this almost native
+
+00:07:15.970 --> 00:07:16.960
+Texinfo support,
+
+00:07:16.960 --> 00:07:18.880
+we have much less things to type,
+
+07:18.880 --> 00:07:19.919
+and there is a much lower
+
+00:07:19.919 --> 00:07:21.120
+potential for errors.
+
+00:07:21.120 --> 00:07:25.199
+But we agree, it’s still a lot of work.
+
+07:25.199 --> 00:07:26.319
+What we’d like now
+
+00:07:26.319 --> 00:07:27.840
+is to work with fellow translators,
+
+00:07:27.840 --> 00:07:28.720
+and here we need to know
+
+00:07:28.720 --> 00:07:29.840
+that OmegaT is actually
+
+00:07:29.840 --> 00:07:32.080
+a hidden svn/git client,
+
+00:07:32.080 --> 00:07:34.240
+and team projects can be hosted
+
+07:34.240 --> 00:07:36.319
+on svn/git platforms.
+
+07:36.319 --> 00:07:37.199
+Translators don’t need to
+
+00:07:37.199 --> 00:07:38.880
+know anything about VCS.
+
+00:07:38.880 --> 00:07:40.720
+They just need access credentials,
+
+00:07:40.720 --> 00:07:42.400
+and OmegaT commits for them.
+
+00:07:42.400 --> 00:07:44.080
+This way we do not have to use
+
+00:07:44.080 --> 00:07:45.759
+ugly and clumsy web-based
+
+00:07:45.759 --> 00:07:47.199
+translation interfaces,
+
+00:07:47.199 --> 00:07:48.800
+and we can use a powerful
+
+00:07:48.800 --> 00:07:51.440
+offline professional tool.
+
+07:51.440 --> 00:07:52.479
+So this is how it looks
+
+00:07:52.479 --> 00:07:54.160
+when you look at the platform
+
+00:07:54.160 --> 00:07:55.919
+where I hosted this project.
+
+07:55.919 --> 00:07:57.199
+The last updates are from
+
+00:07:57.199 --> 00:07:58.639
+20 days and 30 seconds ago
+
+00:07:58.639 --> 00:08:00.720
+when I created this slide,
+
+08:00.720 --> 00:08:02.479
+and you can see that I had a partner
+
+00:08:02.479 --> 00:08:04.639
+who worked with me on the same file set.
+
+08:04.639 --> 00:08:05.520
+Although it looks like
+
+00:08:05.520 --> 00:08:06.879
+we actually committed the translation
+
+00:08:06.879 --> 00:08:07.680
+to the platform,
+
+00:08:07.680 --> 00:08:11.039
+it was not us, but OmegaT.
+
+00:08:11.039 --> 00:08:13.599
+OmegaT does all the heavy-duty work.
+
+08:13.599 --> 00:08:15.039
+It regularly saves to
+
+00:08:15.039 --> 00:08:16.879
+and syncs from the servers.
+
+08:16.879 --> 00:08:18.720
+Translators are regularly kept updated
+
+08:18.720 --> 00:08:20.479
+with work from fellow translators,
+
+00:08:20.479 --> 00:08:21.680
+and when necessary,
+
+00:08:21.680 --> 00:08:23.360
+OmegaT offers a simple
+
+00:08:23.360 --> 00:08:25.440
+conflict-resolution dialogue.
+
+08:25.440 --> 00:08:27.039
+Translators never have to do anything
+
+08:27.039 --> 00:08:29.360
+with svn or git ever.
+
+08:29.360 --> 00:08:30.800
+And now we can envision a future
+
+00:08:30.800 --> 00:08:31.599
+not so far away
+
+00:08:31.599 --> 00:08:33.120
+where the manuals will be translated
+
+00:08:33.120 --> 00:08:34.159
+and eventually included
+
+00:08:34.159 --> 00:08:35.279
+in the distribution,
+
+00:08:35.279 --> 00:08:36.080
+but that’s a topic
+
+00:08:36.080 --> 00:08:39.760
+for a different presentation.
+
+08:39.760 --> 00:08:42.080
+So we’ve reached the end of this session.
+
+08:42.080 --> 00:08:44.240
+Thank you very much again for joining it.
+
+08:44.240 --> 00:08:45.600
+There are plenty of topics
+
+00:08:45.600 --> 00:08:46.880
+I promised I would not address,
+
+00:08:46.880 --> 00:08:50.000
+and I think I kept my promise.
+
+08:50.000 --> 00:08:51.600
+There will be a Q&A now,
+
+00:08:51.600 --> 00:08:52.517
+and I also started
+
+00:08:52.517 --> 00:08:53.600
+a thread about this talk
+
+00:08:53.600 --> 00:08:55.519
+on Reddit last Saturday.
+
+08:55.519 --> 00:08:57.279
+You can find me on the emacs-help
+
+00:08:57.279 --> 00:08:59.200
+and emacs-devel lists as well,
+
+00:08:59.200 --> 00:09:00.480
+so don’t hesitate to send me
+
+00:09:00.480 --> 00:09:02.080
+questions and remarks.
+
+09:02.080 --> 09:06.760
+Thank you again, and see you around.