WEBVTT captioned by brandelune and bhavin192
NOTE Introduction
00:00.000 --> 00:00:05.400
Hello everyone, I am Jean-Christophe Helary,
00:00:05.400 --> 00:00:09.680
I live in Japan, and I'm a translator.
00:09.680 --> 00:00:12.633
Here is my second presentation on this very
00:00:12.633 --> 00:00:15.300
prestigious stage that is the Emacs conference.
00:00:15.300 --> 00:00:18.367
Following my "Let's Translate the 2 million words
00:00:18.367 --> 00:00:21.767
in the Emacs manual" in 2021, my topic this year,
00:00:21.767 --> 00:00:25.167
always related to translation, is
00:00:25.167 --> 00:00:28.400
pre-localizing Emacs or much less pretentiously,
00:00:28.400 --> 00:00:31.933
"Just make sure that your strings don't mix up plurals".
NOTE Usage of package.el
00:00:31.933 --> 00:00:36.133
So, for some reason I resumed Emacs use
00:00:36.133 --> 00:00:39.940
around 2016, and as I was rediscovering the thing
00:00:39.940 --> 00:00:42.800
I found really old outline-mode files here
00:00:42.800 --> 00:00:44.033
and there on my machine.
00:00:44.033 --> 00:00:45.140
And I started to experiment
00:00:45.140 --> 00:00:47.167
again and write again with Emacs.
00:00:47.167 --> 00:00:48.564
I think that at the time,
00:00:48.564 --> 00:00:50.433
I was coming from Aquamacs and because of
00:00:50.433 --> 00:00:53.400
an integration bug with macOS, I decided
00:00:53.400 --> 00:00:55.440
to check what was going on in the code.
00:55.440 --> 00:00:59.040
That was my first official contribution.
NOTE The bug in strings
00:59.040 --> 00:01:02.233
So as I was happily installing and uninstalling
00:01:02.233 --> 00:01:05.267
things, I noticed something weird one day.
00:01:05.267 --> 00:01:09.080
Let me enlarge that picture.
01:09.080 --> 00:01:12.400
See? And even if I were not a translator,
00:01:12.400 --> 00:01:14.960
I would not like that string, and obviously
01:14.960 --> 00:01:16.833
the same bug bites you when the string
00:01:16.833 --> 00:01:20.520
tells you to erase the package.
01:20.520 --> 00:01:26.720
Boom, so we agree that we have a problem here.
NOTE Natural language engineering
01:26.720 --> 00:01:29.067
So, I started to do some spelunking into the code,
00:01:29.067 --> 00:01:31.067
and at least that was my feeling
00:01:31.067 --> 00:01:33.100
because I really am not a programmer
00:01:33.100 --> 00:01:37.240
by any stretch of the imagination.
01:37.240 --> 00:01:39.467
And what I found was an amazing piece of
00:01:39.467 --> 00:01:41.840
natural language engineering that was mixing code
01:41.840 --> 00:01:44.267
with English suffixes and all that,
00:01:44.267 --> 00:01:46.267
and I could see that the people who had
00:01:46.267 --> 00:01:47.767
written that code were pretty smart,
00:01:47.767 --> 00:01:49.533
but had missed a number of edge cases
00:01:49.533 --> 00:01:51.280
that produced the above bugs.
01:51.280 --> 00:01:53.500
That was my first experience with
00:01:53.500 --> 00:01:55.033
all the message related functions,
00:01:55.033 --> 00:01:58.360
"format", "concat", "message", etc.
01:58.360 --> 00:02:00.433
But even with my beginner's eyes I could see that
00:02:00.433 --> 00:02:03.040
something was off because when you want
02:03.040 --> 00:02:06.000
to produce natural language strings you never ever
00:02:06.000 --> 00:02:08.600
should use "replace-regex-in-string" to
02:08.600 --> 00:02:11.067
add an "ing" or an "ed" suffix
00:02:11.067 --> 00:02:12.980
to change the mode of a sentence.
02:12.980 --> 00:02:16.840
But that's what I was seeing was happening.
NOTE More than a missed plural
02:16.840 --> 00:02:20.333
So, what we had to deal with here
00:02:20.333 --> 00:02:22.220
was way more than just a missed plural.
02:22.220 --> 00:02:24.000
It was an attempt at engineering all
00:02:24.000 --> 00:02:26.400
the message strings destined to the user
00:02:26.400 --> 00:02:28.567
with the smart code that was making assumptions
00:02:28.567 --> 00:02:30.067
on the structure of words,
00:02:30.067 --> 00:02:33.220
and in the localization world that's a big no-no.
02:33.220 --> 00:02:36.667
I'm a translator, and such UI strings issues
00:02:36.667 --> 00:02:38.433
have been sorted out decades ago.
00:02:38.433 --> 00:02:41.320
So I was a bit shocked.
NOTE The final patch
02:41.320 --> 00:02:43.533
The final patch took me about a year to write,
00:02:43.533 --> 00:02:45.380
because I'm slow, because I needed to verify
02:45.380 --> 00:02:47.167
and understand a lot, because there are
00:02:47.167 --> 00:02:49.100
plenty of rules and plenty of people who are
00:02:49.100 --> 00:02:51.433
explaining you very nicely what the rules are,
00:02:51.433 --> 00:02:53.733
because I have kids, and because the
00:02:53.733 --> 00:02:55.600
Emacs development list is such a cool place to be
00:02:55.600 --> 00:02:58.560
that you often forget why you're there sometimes.
02:58.560 --> 00:03:01.800
Anyway, for people who can't click on a video,
00:03:01.800 --> 00:03:03.640
and I can't either, here are the relevant
03:03.640 --> 00:03:05.840
parts with some short comments.
03:05.840 --> 00:03:07.800
I'll be talking with localization in mind,
00:03:07.800 --> 00:03:09.640
knowing full well that Emacs localization
03:09.640 --> 00:03:12.800
is not on the map at the moment.
03:12.800 --> 00:03:14.167
So first, there is this thing
00:03:14.167 --> 00:03:15.520
about "format" and "concat".
03:15.520 --> 00:03:17.800
And if I remember correctly,
00:03:17.800 --> 00:03:20.300
"format" is better for user-facing things,
00:03:20.300 --> 00:03:25.160
and "concat" is better for internal things.
03:25.160 --> 00:03:26.800
Here, there are two things.
03:26.800 --> 00:03:28.800
First, a rule that we have when we prepare
00:03:28.800 --> 00:03:30.700
strings that need to be localized is
00:03:30.700 --> 00:03:33.333
never ever make assumptions on the way
00:03:33.333 --> 00:03:35.780
numbers are expressed in the language.
03:35.780 --> 00:03:37.067
Here, the assumption is that
00:03:37.067 --> 00:03:40.000
we have either a singular or plural form,
00:03:40.000 --> 00:03:42.040
and that's not always the case.
03:42.040 --> 00:03:44.067
That usually means that you should externalize
00:03:44.067 --> 00:03:48.280
numbers and find a generic way to express them.
03:48.280 --> 00:03:50.833
So it makes for slightly less natural
00:03:50.833 --> 00:03:54.400
language strings, but it's better anyway.
03:54.400 --> 00:03:56.667
Then we have that comma there that's trying
00:03:56.667 --> 00:03:58.167
to be externalized and that's weird,
00:03:58.167 --> 00:04:02.620
so I put it back into the sentence.
04:02.620 --> 00:04:04.967
Here we have another construct, or two rather,
00:04:04.967 --> 00:04:06.960
that really should not be used like this.
04:06.960 --> 00:04:10.033
It's "prin1" that uses quoting characters,
00:04:10.033 --> 00:04:12.480
just like "print", and "princ" that does not.
04:12.480 --> 00:04:15.400
And you see why they were combined together.
04:15.400 --> 00:04:17.133
And they were both trying to be really smart
00:04:17.133 --> 00:04:19.780
about which article to put in front of a vowel.
04:19.780 --> 00:04:20.960
And you just don't do that.
04:20.960 --> 00:04:25.000
You just keep things simple.
04:25.000 --> 00:04:26.633
Here again, the code is trying to be smart,
00:04:26.633 --> 00:04:28.480
but it's really not much more efficient than
04:28.480 --> 00:04:34.940
plainly stating what you want.
04:34.940 --> 00:04:36.500
And here again, we have "concat" things
00:04:36.500 --> 00:04:40.367
that we could just use to plainly state
00:04:40.367 --> 00:04:41.980
what we want to state.
04:41.980 --> 00:04:49.880
So, instead of "concat" I just put a "message".
04:49.880 --> 00:04:52.260
And here we have something that's very cute.
04:52.260 --> 00:04:54.540
It's a computerized plural.
04:54.540 --> 00:04:55.700
Here again, assuming that
00:04:55.700 --> 00:04:58.640
there are only plural or singular forms.
04:58.640 --> 00:05:00.867
But the end string is not that much more natural
00:05:00.867 --> 00:05:02.700
than the fix, the code is less efficient
00:05:02.700 --> 00:05:07.760
and is harder to understand.
05:07.760 --> 00:05:09.433
Here again, the code is trying to make
00:05:09.433 --> 00:05:13.520
smart things where it could be much simpler.
05:13.520 --> 00:05:14.667
That is the part where you get the
00:05:14.667 --> 00:05:19.480
number of packages and their names.
05:19.480 --> 00:05:22.067
Here the whole sentence with the semicolons
00:05:22.067 --> 00:05:26.333
and the question mark is split in parts,
00:05:26.333 --> 00:05:29.180
between which something will be inserted.
05:29.180 --> 00:05:34.240
That's really ugly and difficult to read.
05:34.240 --> 00:05:37.700
Here again, another "ing" waiting to be
00:05:37.700 --> 00:05:44.840
regex-inserted into the code.
05:44.840 --> 00:05:46.633
And here at last, we get to the point
00:05:46.633 --> 00:05:48.760
where everything started.
05:48.760 --> 00:05:50.833
And you can see that unlike in the other spots,
00:05:50.833 --> 00:05:52.400
there is no possibility for the expression
05:52.400 --> 00:05:54.680
to be singular.
05:54.680 --> 00:05:57.600
So, I guess that if it hadn't been for that bug,
00:05:57.600 --> 00:05:59.320
I would not have found the other items,
05:59.320 --> 00:06:01.033
and we would be left with code that works,
00:06:01.033 --> 00:06:02.033
of course, but that is
00:06:02.033 --> 00:06:06.020
harder to understand, and maintain.
06:06.020 --> 00:06:08.333
Last but not least, a last version of
00:06:08.333 --> 00:06:10.920
"just plainly state what you mean to state".
06:10.920 --> 00:06:14.880
Keep it simple.
NOTE "What did I learn, and how did I learn it?"
06:14.880 --> 00:06:19.267
So first, we have this wonderful CONTRIBUTE file
00:06:19.267 --> 00:06:21.267
that is very explicit about
00:06:21.267 --> 00:06:23.520
how we must proceed when contributing code.
06:23.520 --> 00:06:25.233
So, that's really the first place
00:06:25.233 --> 00:06:27.760
that we should all read.
06:27.760 --> 00:06:29.333
The README file is pretty cool too,
00:06:29.333 --> 00:06:30.967
especially at the beginning of the process,
00:06:30.967 --> 00:06:31.867
when you're not sure whether
00:06:31.867 --> 00:06:36.240
you want to fix that bug or just report it.
NOTE Useful packages
06:36.240 --> 00:06:37.920
And then we've got packages.
06:37.920 --> 00:06:39.900
We've got a number of packages that are really
00:06:39.900 --> 00:06:42.600
helpful when it comes to reading
00:06:42.600 --> 00:06:45.880
the information and the manuals.
06:45.880 --> 00:06:48.000
I'm mentioning three of them here,
00:06:48.000 --> 00:06:53.720
and I think they are the most important for us.
NOTE Package: helpful
06:53.720 --> 00:06:55.600
So "helpful" is on the right,
00:06:55.600 --> 00:06:58.667
and it's overflowing the window with
00:06:58.667 --> 00:07:01.900
all the contextualized information it provides,
00:07:01.900 --> 00:07:05.280
and the standard "help" is on the left.
07:05.280 --> 00:07:07.933
I mean, really there are like two or three
00:07:07.933 --> 00:07:11.567
screen-full of information in the "helpful" output,
00:07:11.567 --> 00:07:13.233
so you really only see a part,
00:07:13.233 --> 00:07:16.320
but I guess if you use it, you know what I'm saying.
07:16.320 --> 00:07:18.867
What I like the most here is the "view in manual"
00:07:18.867 --> 00:07:21.800
part, where you can actually click and even get
00:07:21.800 --> 00:07:23.667
more information that's sometimes
00:07:23.667 --> 00:07:28.400
easier to read and understand.
NOTE Package: inform
07:28.400 --> 00:07:33.640
And then you've got the "info" versus "inform" formats.
07:33.640 --> 00:07:34.567
When you're in the manual,
00:07:34.567 --> 00:07:37.140
"inform" makes a huge difference.
07:37.140 --> 00:07:39.367
You can see here that you've got colorized items,
00:07:39.367 --> 00:07:42.000
and also in the middle you've got that
07:42.000 --> 00:07:45.000
'read' part that's green and bold.
07:45.000 --> 00:07:49.333
In "info" it's not a specific object,
00:07:49.333 --> 00:07:52.200
it's just a string. In 'inform' it's actually
00:07:52.200 --> 00:07:53.800
a link that you can click,
00:07:53.800 --> 00:07:58.320
and actually go to that 'read' manual page.
NOTE Package: which-key
07:58.320 --> 00:08:01.300
Now, we've got "which-key".
08:01.300 --> 00:08:03.400
"which-key" is a savior for beginners too.
08:03.400 --> 00:08:04.867
Just wait half a second or something,
00:08:04.867 --> 00:08:06.500
and Emacs will show you all the keys
00:08:06.500 --> 00:08:08.433
that you can access from the prefix combination
00:08:08.433 --> 00:08:09.920
that you just typed.
08:09.920 --> 00:08:13.200
So, it's really helpful for discovering functions
00:08:13.200 --> 00:08:19.160
and learning new functions, getting used to them.
NOTE It all started with this message…
08:19.160 --> 00:08:21.500
And so that whole process started…,
00:08:21.500 --> 00:08:26.533
it was May 23, 2017,
00:08:26.533 --> 00:08:30.440
with that thread when I found the bug.
08:30.440 --> 00:08:32.800
I just bumped into an English/code bug
00:08:32.800 --> 00:08:36.920
this morning. In package.el, when one package
08:36.920 --> 00:08:39.033
is not needed anymore, the message is:
00:08:39.033 --> 00:08:41.300
"Package menu: Operation finished.
00:08:41.300 --> 00:08:44.880
1 packages are no longer needed", etc.
08:44.880 --> 00:08:49.633
So, I was asking whether we had best practices
00:08:49.633 --> 00:08:53.800
for using messages, and we had a whole thread
08:53.800 --> 00:08:57.867
about that. And while I was discussing on that
00:08:57.867 --> 00:09:01.240
thread, I started that new thread, which is:
09:01.240 --> 00:09:02.867
"package.el strings".
00:09:02.867 --> 00:09:09.900
The whole thing actually ended on June 27, 2018.
00:09:09.900 --> 00:09:15.400
So, a year after, with that message from Noam
00:09:15.400 --> 00:09:18.567
telling me that "Yes I can close the bug,"
00:09:18.567 --> 00:09:22.040
and that was it.
09:22.040 --> 00:09:24.000
So, it took about a year to finish that.
00:09:24.000 --> 00:09:28.133
What I did learn basically is that
00:09:28.133 --> 00:09:32.160
helping with Emacs is not that difficult.
09:32.160 --> 00:09:36.100
It takes time when you're not fluent with the code,
00:09:36.100 --> 00:09:37.100
but that's okay because the reference
09:37.100 --> 00:09:39.300
is excellent, and there are lots of people
00:09:39.300 --> 00:09:41.520
who are here to help.
NOTE Conclusion
09:41.520 --> 00:09:45.700
Basically, the solution to all our problems is
00:09:45.700 --> 00:09:47.733
"Keep It Simple and Straightforward".
00:09:47.733 --> 00:09:51.033
As you can see in that patch,
00:09:51.033 --> 00:09:53.233
even if it's a beginner's patch,
00:09:53.233 --> 00:09:57.733
what I did shows what can be done by Emacs Lisp
00:09:57.733 --> 00:09:59.533
beginners to help with "straightening" the strings
00:09:59.533 --> 00:10:02.267
to reduce the number of potential English bugs.
00:10:02.267 --> 00:10:04.533
And then to make Emacs strings easier
00:10:04.533 --> 00:10:07.233
to be handled by real localization processes one day.
00:10:07.233 --> 00:10:09.067
But it doesn't have to be about strings
00:10:09.067 --> 00:10:12.767
because strings can be an easy entry point to Emacs,
00:10:12.767 --> 00:10:16.720
but it can be any itch that you want to scratch.
10:16.720 --> 00:10:18.267
And my real conclusion is that
00:10:18.267 --> 00:10:22.160
Emacs is free software, and what that means is mostly
10:22.160 --> 00:10:24.067
that it allows you to do things that you would
00:10:24.067 --> 00:10:27.920
never have thought of being able to do before.
10:27.920 --> 00:10:32.000
That's really the biggest lesson to be learned here.
10:32.000 --> 00:10:33.400
So, I want to thank all the people
00:10:33.400 --> 00:10:37.920
who allowed this to be happening, allowed me to
10:37.920 --> 00:10:41.267
learn a bit and contribute a bit to that wonderful
00:10:41.267 --> 00:10:42.800
piece of software that Emacs is.
00:10:42.800 --> 00:10:44.533
And thank you everyone for listening,
00:10:44.533 --> 00:10:46.700
and hopefully I'll see you next year
00:10:46.700 --> 00:10:51.520
with a different translation related presentation.
10:51.520 --> 11:13.640
Thank you very much.