WEBVTT captioned by brandelune and bhavin192

NOTE Introduction

00:00.000 --> 00:00:05.400
Hello everyone, I am Jean-Christophe Helary,

00:00:05.400 --> 00:00:09.680
I live in Japan, and I'm a translator.

00:09.680 --> 00:00:12.633
Here is my second presentation on this very

00:00:12.633 --> 00:00:15.300
prestigious stage that is the Emacs conference.

00:00:15.300 --> 00:00:18.367
Following my "Let's Translate the 2 million words

00:00:18.367 --> 00:00:21.767
in the Emacs manual" in 2021, my topic this year,

00:00:21.767 --> 00:00:25.167
always related to translation, is

00:00:25.167 --> 00:00:28.400
pre-localizing Emacs or much less pretentiously,

00:00:28.400 --> 00:00:31.933
"Just make sure that your strings don't mix up plurals".

NOTE Usage of package.el

00:00:31.933 --> 00:00:36.133
So, for some reason I resumed Emacs use

00:00:36.133 --> 00:00:39.940
around 2016, and as I was rediscovering the thing

00:00:39.940 --> 00:00:42.800
I found really old outline-mode files here

00:00:42.800 --> 00:00:44.033
and there on my machine.

00:00:44.033 --> 00:00:45.140
And I started to experiment

00:00:45.140 --> 00:00:47.167
again and write again with Emacs.

00:00:47.167 --> 00:00:48.564
I think that at the time,

00:00:48.564 --> 00:00:50.433
I was coming from Aquamacs and because of

00:00:50.433 --> 00:00:53.400
an integration bug with macOS, I decided

00:00:53.400 --> 00:00:55.440
to check what was going on in the code.

00:55.440 --> 00:00:59.040
That was my first official contribution.

NOTE The bug in strings

00:59.040 --> 00:01:02.233
So as I was happily installing and uninstalling

00:01:02.233 --> 00:01:05.267
things, I noticed something weird one day.

00:01:05.267 --> 00:01:09.080
Let me enlarge that picture.

01:09.080 --> 00:01:12.400
See? And even if I were not a translator,

00:01:12.400 --> 00:01:14.960
I would not like that string, and obviously

01:14.960 --> 00:01:16.833
the same bug bites you when the string

00:01:16.833 --> 00:01:20.520
tells you to erase the package.

01:20.520 --> 00:01:26.720
Boom, so we agree that we have a problem here.

NOTE Natural language engineering

01:26.720 --> 00:01:29.067
So, I started to do some spelunking into the code,

00:01:29.067 --> 00:01:31.067
and at least that was my feeling

00:01:31.067 --> 00:01:33.100
because I really am not a programmer

00:01:33.100 --> 00:01:37.240
by any stretch of the imagination.

01:37.240 --> 00:01:39.467
And what I found was an amazing piece of

00:01:39.467 --> 00:01:41.840
natural language engineering that was mixing code

01:41.840 --> 00:01:44.267
with English suffixes and all that,

00:01:44.267 --> 00:01:46.267
and I could see that the people who had

00:01:46.267 --> 00:01:47.767
written that code were pretty smart,

00:01:47.767 --> 00:01:49.533
but had missed a number of edge cases

00:01:49.533 --> 00:01:51.280
that produced the above bugs.

01:51.280 --> 00:01:53.500
That was my first experience with

00:01:53.500 --> 00:01:55.033
all the message related functions,

00:01:55.033 --> 00:01:58.360
"format", "concat", "message", etc.

01:58.360 --> 00:02:00.433
But even with my beginner's eyes I could see that

00:02:00.433 --> 00:02:03.040
something was off because when you want

02:03.040 --> 00:02:06.000
to produce natural language strings you never ever

00:02:06.000 --> 00:02:08.600
should use "replace-regex-in-string" to

02:08.600 --> 00:02:11.067
add an "ing" or an "ed" suffix

00:02:11.067 --> 00:02:12.980
to change the mode of a sentence.

02:12.980 --> 00:02:16.840
But that's what I was seeing was happening.

NOTE More than a missed plural

02:16.840 --> 00:02:20.333
So, what we had to deal with here

00:02:20.333 --> 00:02:22.220
was way more than just a missed plural.

02:22.220 --> 00:02:24.000
It was an attempt at engineering all

00:02:24.000 --> 00:02:26.400
the message strings destined to the user

00:02:26.400 --> 00:02:28.567
with the smart code that was making assumptions

00:02:28.567 --> 00:02:30.067
on the structure of words,

00:02:30.067 --> 00:02:33.220
and in the localization world that's a big no-no.

02:33.220 --> 00:02:36.667
I'm a translator, and such UI strings issues

00:02:36.667 --> 00:02:38.433
have been sorted out decades ago.

00:02:38.433 --> 00:02:41.320
So I was a bit shocked.

NOTE The final patch

02:41.320 --> 00:02:43.533
The final patch took me about a year to write,

00:02:43.533 --> 00:02:45.380
because I'm slow, because I needed to verify

02:45.380 --> 00:02:47.167
and understand a lot, because there are

00:02:47.167 --> 00:02:49.100
plenty of rules and plenty of people who are

00:02:49.100 --> 00:02:51.433
explaining you very nicely what the rules are,

00:02:51.433 --> 00:02:53.733
because I have kids, and because the

00:02:53.733 --> 00:02:55.600
Emacs development list is such a cool place to be

00:02:55.600 --> 00:02:58.560
that you often forget why you're there sometimes.

02:58.560 --> 00:03:01.800
Anyway, for people who can't click on a video,

00:03:01.800 --> 00:03:03.640
and I can't either, here are the relevant

03:03.640 --> 00:03:05.840
parts with some short comments.

03:05.840 --> 00:03:07.800
I'll be talking with localization in mind,

00:03:07.800 --> 00:03:09.640
knowing full well that Emacs localization

03:09.640 --> 00:03:12.800
is not on the map at the moment.

03:12.800 --> 00:03:14.167
So first, there is this thing

00:03:14.167 --> 00:03:15.520
about "format" and "concat".

03:15.520 --> 00:03:17.800
And if I remember correctly,

00:03:17.800 --> 00:03:20.300
"format" is better for user-facing things,

00:03:20.300 --> 00:03:25.160
and "concat" is better for internal things.

03:25.160 --> 00:03:26.800
Here, there are two things.

03:26.800 --> 00:03:28.800
First, a rule that we have when we prepare

00:03:28.800 --> 00:03:30.700
strings that need to be localized is

00:03:30.700 --> 00:03:33.333
never ever make assumptions on the way

00:03:33.333 --> 00:03:35.780
numbers are expressed in the language.

03:35.780 --> 00:03:37.067
Here, the assumption is that

00:03:37.067 --> 00:03:40.000
we have either a singular or plural form,

00:03:40.000 --> 00:03:42.040
and that's not always the case.

03:42.040 --> 00:03:44.067
That usually means that you should externalize

00:03:44.067 --> 00:03:48.280
numbers and find a generic way to express them.

03:48.280 --> 00:03:50.833
So it makes for slightly less natural

00:03:50.833 --> 00:03:54.400
language strings, but it's better anyway.

03:54.400 --> 00:03:56.667
Then we have that comma there that's trying

00:03:56.667 --> 00:03:58.167
to be externalized and that's weird,

00:03:58.167 --> 00:04:02.620
so I put it back into the sentence.

04:02.620 --> 00:04:04.967
Here we have another construct, or two rather,

00:04:04.967 --> 00:04:06.960
that really should not be used like this.

04:06.960 --> 00:04:10.033
It's "prin1" that uses quoting characters,

00:04:10.033 --> 00:04:12.480
just like "print", and "princ" that does not.

04:12.480 --> 00:04:15.400
And you see why they were combined together.

04:15.400 --> 00:04:17.133
And they were both trying to be really smart

00:04:17.133 --> 00:04:19.780
about which article to put in front of a vowel.

04:19.780 --> 00:04:20.960
And you just don't do that.

04:20.960 --> 00:04:25.000
You just keep things simple.

04:25.000 --> 00:04:26.633
Here again, the code is trying to be smart,

00:04:26.633 --> 00:04:28.480
but it's really not much more efficient than

04:28.480 --> 00:04:34.940
plainly stating what you want.

04:34.940 --> 00:04:36.500
And here again, we have "concat" things

00:04:36.500 --> 00:04:40.367
that we could just use to plainly state

00:04:40.367 --> 00:04:41.980
what we want to state.

04:41.980 --> 00:04:49.880
So, instead of "concat" I just put a "message".

04:49.880 --> 00:04:52.260
And here we have something that's very cute.

04:52.260 --> 00:04:54.540
It's a computerized plural.

04:54.540 --> 00:04:55.700
Here again, assuming that

00:04:55.700 --> 00:04:58.640
there are only plural or singular forms.

04:58.640 --> 00:05:00.867
But the end string is not that much more natural

00:05:00.867 --> 00:05:02.700
than the fix, the code is less efficient

00:05:02.700 --> 00:05:07.760
and is harder to understand.

05:07.760 --> 00:05:09.433
Here again, the code is trying to make

00:05:09.433 --> 00:05:13.520
smart things where it could be much simpler.

05:13.520 --> 00:05:14.667
That is the part where you get the

00:05:14.667 --> 00:05:19.480
number of packages and their names.

05:19.480 --> 00:05:22.067
Here the whole sentence with the semicolons

00:05:22.067 --> 00:05:26.333
and the question mark is split in parts,

00:05:26.333 --> 00:05:29.180
between which something will be inserted.

05:29.180 --> 00:05:34.240
That's really ugly and difficult to read.

05:34.240 --> 00:05:37.700
Here again, another "ing" waiting to be

00:05:37.700 --> 00:05:44.840
regex-inserted into the code.

05:44.840 --> 00:05:46.633
And here at last, we get to the point

00:05:46.633 --> 00:05:48.760
where everything started.

05:48.760 --> 00:05:50.833
And you can see that unlike in the other spots,

00:05:50.833 --> 00:05:52.400
there is no possibility for the expression

05:52.400 --> 00:05:54.680
to be singular.

05:54.680 --> 00:05:57.600
So, I guess that if it hadn't been for that bug,

00:05:57.600 --> 00:05:59.320
I would not have found the other items,

05:59.320 --> 00:06:01.033
and we would be left with code that works,

00:06:01.033 --> 00:06:02.033
of course, but that is

00:06:02.033 --> 00:06:06.020
harder to understand, and maintain.

06:06.020 --> 00:06:08.333
Last but not least, a last version of

00:06:08.333 --> 00:06:10.920
"just plainly state what you mean to state".

06:10.920 --> 00:06:14.880
Keep it simple.

NOTE "What did I learn, and how did I learn it?"

06:14.880 --> 00:06:19.267
So first, we have this wonderful CONTRIBUTE file

00:06:19.267 --> 00:06:21.267
that is very explicit about

00:06:21.267 --> 00:06:23.520
how we must proceed when contributing code.

06:23.520 --> 00:06:25.233
So, that's really the first place

00:06:25.233 --> 00:06:27.760
that we should all read.

06:27.760 --> 00:06:29.333
The README file is pretty cool too,

00:06:29.333 --> 00:06:30.967
especially at the beginning of the process,

00:06:30.967 --> 00:06:31.867
when you're not sure whether

00:06:31.867 --> 00:06:36.240
you want to fix that bug or just report it.

NOTE Useful packages

06:36.240 --> 00:06:37.920
And then we've got packages.

06:37.920 --> 00:06:39.900
We've got a number of packages that are really

00:06:39.900 --> 00:06:42.600
helpful when it comes to reading

00:06:42.600 --> 00:06:45.880
the information and the manuals.

06:45.880 --> 00:06:48.000
I'm mentioning three of them here,

00:06:48.000 --> 00:06:53.720
and I think they are the most important for us.

NOTE Package: helpful

06:53.720 --> 00:06:55.600
So "helpful" is on the right,

00:06:55.600 --> 00:06:58.667
and it's overflowing the window with

00:06:58.667 --> 00:07:01.900
all the contextualized information it provides,

00:07:01.900 --> 00:07:05.280
and the standard "help" is on the left.

07:05.280 --> 00:07:07.933
I mean, really there are like two or three

00:07:07.933 --> 00:07:11.567
screen-full of information in the "helpful" output,

00:07:11.567 --> 00:07:13.233
so you really only see a part,

00:07:13.233 --> 00:07:16.320
but I guess if you use it, you know what I'm saying.

07:16.320 --> 00:07:18.867
What I like the most here is the "view in manual"

00:07:18.867 --> 00:07:21.800
part, where you can actually click and even get

00:07:21.800 --> 00:07:23.667
more information that's sometimes

00:07:23.667 --> 00:07:28.400
easier to read and understand.

NOTE Package: inform

07:28.400 --> 00:07:33.640
And then you've got the "info" versus "inform" formats.

07:33.640 --> 00:07:34.567
When you're in the manual,

00:07:34.567 --> 00:07:37.140
"inform" makes a huge difference.

07:37.140 --> 00:07:39.367
You can see here that you've got colorized items,

00:07:39.367 --> 00:07:42.000
and also in the middle you've got that

07:42.000 --> 00:07:45.000
'read' part that's green and bold.

07:45.000 --> 00:07:49.333
In "info" it's not a specific object,

00:07:49.333 --> 00:07:52.200
it's just a string. In 'inform' it's actually

00:07:52.200 --> 00:07:53.800
a link that you can click,

00:07:53.800 --> 00:07:58.320
and actually go to that 'read' manual page.

NOTE Package: which-key

07:58.320 --> 00:08:01.300
Now, we've got "which-key".

08:01.300 --> 00:08:03.400
"which-key" is a savior for beginners too.

08:03.400 --> 00:08:04.867
Just wait half a second or something,

00:08:04.867 --> 00:08:06.500
and Emacs will show you all the keys

00:08:06.500 --> 00:08:08.433
that you can access from the prefix combination

00:08:08.433 --> 00:08:09.920
that you just typed.

08:09.920 --> 00:08:13.200
So, it's really helpful for discovering functions

00:08:13.200 --> 00:08:19.160
and learning new functions, getting used to them.

NOTE It all started with this messageā€¦

08:19.160 --> 00:08:21.500
And so that whole process startedā€¦,

00:08:21.500 --> 00:08:26.533
it was May 23, 2017,

00:08:26.533 --> 00:08:30.440
with that thread when I found the bug.

08:30.440 --> 00:08:32.800
I just bumped into an English/code bug

00:08:32.800 --> 00:08:36.920
this morning. In package.el, when one package

08:36.920 --> 00:08:39.033
is not needed anymore, the message is:

00:08:39.033 --> 00:08:41.300
"Package menu: Operation finished.

00:08:41.300 --> 00:08:44.880
1 packages are no longer needed", etc.

08:44.880 --> 00:08:49.633
So, I was asking whether we had best practices

00:08:49.633 --> 00:08:53.800
for using messages, and we had a whole thread

08:53.800 --> 00:08:57.867
about that. And while I was discussing on that

00:08:57.867 --> 00:09:01.240
thread, I started that new thread, which is:

09:01.240 --> 00:09:02.867
"package.el strings".

00:09:02.867 --> 00:09:09.900
The whole thing actually ended on June 27, 2018.

00:09:09.900 --> 00:09:15.400
So, a year after, with that message from Noam

00:09:15.400 --> 00:09:18.567
telling me that "Yes I can close the bug,"

00:09:18.567 --> 00:09:22.040
and that was it.

09:22.040 --> 00:09:24.000
So, it took about a year to finish that.

00:09:24.000 --> 00:09:28.133
What I did learn basically is that

00:09:28.133 --> 00:09:32.160
helping with Emacs is not that difficult.

09:32.160 --> 00:09:36.100
It takes time when you're not fluent with the code,

00:09:36.100 --> 00:09:37.100
but that's okay because the reference

09:37.100 --> 00:09:39.300
is excellent, and there are lots of people

00:09:39.300 --> 00:09:41.520
who are here to help.

NOTE Conclusion

09:41.520 --> 00:09:45.700
Basically, the solution to all our problems is

00:09:45.700 --> 00:09:47.733
"Keep It Simple and Straightforward".

00:09:47.733 --> 00:09:51.033
As you can see in that patch,

00:09:51.033 --> 00:09:53.233
even if it's a beginner's patch,

00:09:53.233 --> 00:09:57.733
what I did shows what can be done by Emacs Lisp

00:09:57.733 --> 00:09:59.533
beginners to help with "straightening" the strings

00:09:59.533 --> 00:10:02.267
to reduce the number of potential English bugs.

00:10:02.267 --> 00:10:04.533
And then to make Emacs strings easier

00:10:04.533 --> 00:10:07.233
to be handled by real localization processes one day.

00:10:07.233 --> 00:10:09.067
But it doesn't have to be about strings

00:10:09.067 --> 00:10:12.767
because strings can be an easy entry point to Emacs,

00:10:12.767 --> 00:10:16.720
but it can be any itch that you want to scratch.

10:16.720 --> 00:10:18.267
And my real conclusion is that

00:10:18.267 --> 00:10:22.160
Emacs is free software, and what that means is mostly

10:22.160 --> 00:10:24.067
that it allows you to do things that you would

00:10:24.067 --> 00:10:27.920
never have thought of being able to do before.

10:27.920 --> 00:10:32.000
That's really the biggest lesson to be learned here.

10:32.000 --> 00:10:33.400
So, I want to thank all the people

00:10:33.400 --> 00:10:37.920
who allowed this to be happening, allowed me to

10:37.920 --> 00:10:41.267
learn a bit and contribute a bit to that wonderful

00:10:41.267 --> 00:10:42.800
piece of software that Emacs is.

00:10:42.800 --> 00:10:44.533
And thank you everyone for listening,

00:10:44.533 --> 00:10:46.700
and hopefully I'll see you next year

00:10:46.700 --> 00:10:51.520
with a different translation related presentation.

10:51.520 --> 11:13.640
Thank you very much.