summaryrefslogtreecommitdiffstats
path: root/2022/captions/emacsconf-2022-localizing--prelocalizing-emacs--jeanchristophe-helary--main.vtt
blob: a86af897c4f00b51cc69c4c63117ec8db353a6ed (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
WEBVTT captioned by brandelune and bhavin192

NOTE Introduction

00:00.000 --> 00:00:05.400
Hello everyone, I am Jean-Christophe Helary,

00:00:05.400 --> 00:00:09.680
I live in Japan, and I'm a translator.

00:09.680 --> 00:00:12.633
Here is my second presentation on this very

00:00:12.633 --> 00:00:15.300
prestigious stage that is the Emacs conference.

00:00:15.300 --> 00:00:18.367
Following my "Let's Translate the 2 million words

00:00:18.367 --> 00:00:21.767
in the Emacs manual" in 2021, my topic this year,

00:00:21.767 --> 00:00:25.167
always related to translation, is

00:00:25.167 --> 00:00:28.400
pre-localizing Emacs or much less pretentiously,

00:00:28.400 --> 00:00:31.933
"Just make sure that your strings don't mix up plurals".

NOTE Usage of package.el

00:00:31.933 --> 00:00:36.133
So, for some reason I resumed Emacs use

00:00:36.133 --> 00:00:39.940
around 2016, and as I was rediscovering the thing

00:00:39.940 --> 00:00:42.800
I found really old outline-mode files here

00:00:42.800 --> 00:00:44.033
and there on my machine.

00:00:44.033 --> 00:00:45.140
And I started to experiment

00:00:45.140 --> 00:00:47.167
again and write again with Emacs.

00:00:47.167 --> 00:00:48.564
I think that at the time,

00:00:48.564 --> 00:00:50.433
I was coming from Aquamacs and because of

00:00:50.433 --> 00:00:53.400
an integration bug with macOS, I decided

00:00:53.400 --> 00:00:55.440
to check what was going on in the code.

00:55.440 --> 00:00:59.040
That was my first official contribution.

NOTE The bug in strings

00:59.040 --> 00:01:02.233
So as I was happily installing and uninstalling

00:01:02.233 --> 00:01:05.267
things, I noticed something weird one day.

00:01:05.267 --> 00:01:09.080
Let me enlarge that picture.

01:09.080 --> 00:01:12.400
See? And even if I were not a translator,

00:01:12.400 --> 00:01:14.960
I would not like that string, and obviously

01:14.960 --> 00:01:16.833
the same bug bites you when the string

00:01:16.833 --> 00:01:20.520
tells you to erase the package.

01:20.520 --> 00:01:26.720
Boom, so we agree that we have a problem here.

NOTE Natural language engineering

01:26.720 --> 00:01:29.067
So, I started to do some spelunking into the code,

00:01:29.067 --> 00:01:31.067
and at least that was my feeling

00:01:31.067 --> 00:01:33.100
because I really am not a programmer

00:01:33.100 --> 00:01:37.240
by any stretch of the imagination.

01:37.240 --> 00:01:39.467
And what I found was an amazing piece of

00:01:39.467 --> 00:01:41.840
natural language engineering that was mixing code

01:41.840 --> 00:01:44.267
with English suffixes and all that,

00:01:44.267 --> 00:01:46.267
and I could see that the people who had

00:01:46.267 --> 00:01:47.767
written that code were pretty smart,

00:01:47.767 --> 00:01:49.533
but had missed a number of edge cases

00:01:49.533 --> 00:01:51.280
that produced the above bugs.

01:51.280 --> 00:01:53.500
That was my first experience with

00:01:53.500 --> 00:01:55.033
all the message related functions,

00:01:55.033 --> 00:01:58.360
"format", "concat", "message", etc.

01:58.360 --> 00:02:00.433
But even with my beginner's eyes I could see that

00:02:00.433 --> 00:02:03.040
something was off because when you want

02:03.040 --> 00:02:06.000
to produce natural language strings you never ever

00:02:06.000 --> 00:02:08.600
should use "replace-regex-in-string" to

02:08.600 --> 00:02:11.067
add an "ing" or an "ed" suffix

00:02:11.067 --> 00:02:12.980
to change the mode of a sentence.

02:12.980 --> 00:02:16.840
But that's what I was seeing was happening.

NOTE More than a missed plural

02:16.840 --> 00:02:20.333
So, what we had to deal with here

00:02:20.333 --> 00:02:22.220
was way more than just a missed plural.

02:22.220 --> 00:02:24.000
It was an attempt at engineering all

00:02:24.000 --> 00:02:26.400
the message strings destined to the user

00:02:26.400 --> 00:02:28.567
with the smart code that was making assumptions

00:02:28.567 --> 00:02:30.067
on the structure of words,

00:02:30.067 --> 00:02:33.220
and in the localization world that's a big no-no.

02:33.220 --> 00:02:36.667
I'm a translator, and such UI strings issues

00:02:36.667 --> 00:02:38.433
have been sorted out decades ago.

00:02:38.433 --> 00:02:41.320
So I was a bit shocked.

NOTE The final patch

02:41.320 --> 00:02:43.533
The final patch took me about a year to write,

00:02:43.533 --> 00:02:45.380
because I'm slow, because I needed to verify

02:45.380 --> 00:02:47.167
and understand a lot, because there are

00:02:47.167 --> 00:02:49.100
plenty of rules and plenty of people who are

00:02:49.100 --> 00:02:51.433
explaining you very nicely what the rules are,

00:02:51.433 --> 00:02:53.733
because I have kids, and because the

00:02:53.733 --> 00:02:55.600
Emacs development list is such a cool place to be

00:02:55.600 --> 00:02:58.560
that you often forget why you're there sometimes.

02:58.560 --> 00:03:01.800
Anyway, for people who can't click on a video,

00:03:01.800 --> 00:03:03.640
and I can't either, here are the relevant

03:03.640 --> 00:03:05.840
parts with some short comments.

03:05.840 --> 00:03:07.800
I'll be talking with localization in mind,

00:03:07.800 --> 00:03:09.640
knowing full well that Emacs localization

03:09.640 --> 00:03:12.800
is not on the map at the moment.

03:12.800 --> 00:03:14.167
So first, there is this thing

00:03:14.167 --> 00:03:15.520
about "format" and "concat".

03:15.520 --> 00:03:17.800
And if I remember correctly,

00:03:17.800 --> 00:03:20.300
"format" is better for user-facing things,

00:03:20.300 --> 00:03:25.160
and "concat" is better for internal things.

03:25.160 --> 00:03:26.800
Here, there are two things.

03:26.800 --> 00:03:28.800
First, a rule that we have when we prepare

00:03:28.800 --> 00:03:30.700
strings that need to be localized is

00:03:30.700 --> 00:03:33.333
never ever make assumptions on the way

00:03:33.333 --> 00:03:35.780
numbers are expressed in the language.

03:35.780 --> 00:03:37.067
Here, the assumption is that

00:03:37.067 --> 00:03:40.000
we have either a singular or plural form,

00:03:40.000 --> 00:03:42.040
and that's not always the case.

03:42.040 --> 00:03:44.067
That usually means that you should externalize

00:03:44.067 --> 00:03:48.280
numbers and find a generic way to express them.

03:48.280 --> 00:03:50.833
So it makes for slightly less natural

00:03:50.833 --> 00:03:54.400
language strings, but it's better anyway.

03:54.400 --> 00:03:56.667
Then we have that comma there that's trying

00:03:56.667 --> 00:03:58.167
to be externalized and that's weird,

00:03:58.167 --> 00:04:02.620
so I put it back into the sentence.

04:02.620 --> 00:04:04.967
Here we have another construct, or two rather,

00:04:04.967 --> 00:04:06.960
that really should not be used like this.

04:06.960 --> 00:04:10.033
It's "prin1" that uses quoting characters,

00:04:10.033 --> 00:04:12.480
just like "print", and "princ" that does not.

04:12.480 --> 00:04:15.400
And you see why they were combined together.

04:15.400 --> 00:04:17.133
And they were both trying to be really smart

00:04:17.133 --> 00:04:19.780
about which article to put in front of a vowel.

04:19.780 --> 00:04:20.960
And you just don't do that.

04:20.960 --> 00:04:25.000
You just keep things simple.

04:25.000 --> 00:04:26.633
Here again, the code is trying to be smart,

00:04:26.633 --> 00:04:28.480
but it's really not much more efficient than

04:28.480 --> 00:04:34.940
plainly stating what you want.

04:34.940 --> 00:04:36.500
And here again, we have "concat" things

00:04:36.500 --> 00:04:40.367
that we could just use to plainly state

00:04:40.367 --> 00:04:41.980
what we want to state.

04:41.980 --> 00:04:49.880
So, instead of "concat" I just put a "message".

04:49.880 --> 00:04:52.260
And here we have something that's very cute.

04:52.260 --> 00:04:54.540
It's a computerized plural.

04:54.540 --> 00:04:55.700
Here again, assuming that

00:04:55.700 --> 00:04:58.640
there are only plural or singular forms.

04:58.640 --> 00:05:00.867
But the end string is not that much more natural

00:05:00.867 --> 00:05:02.700
than the fix, the code is less efficient

00:05:02.700 --> 00:05:07.760
and is harder to understand.

05:07.760 --> 00:05:09.433
Here again, the code is trying to make

00:05:09.433 --> 00:05:13.520
smart things where it could be much simpler.

05:13.520 --> 00:05:14.667
That is the part where you get the

00:05:14.667 --> 00:05:19.480
number of packages and their names.

05:19.480 --> 00:05:22.067
Here the whole sentence with the semicolons

00:05:22.067 --> 00:05:26.333
and the question mark is split in parts,

00:05:26.333 --> 00:05:29.180
between which something will be inserted.

05:29.180 --> 00:05:34.240
That's really ugly and difficult to read.

05:34.240 --> 00:05:37.700
Here again, another "ing" waiting to be

00:05:37.700 --> 00:05:44.840
regex-inserted into the code.

05:44.840 --> 00:05:46.633
And here at last, we get to the point

00:05:46.633 --> 00:05:48.760
where everything started.

05:48.760 --> 00:05:50.833
And you can see that unlike in the other spots,

00:05:50.833 --> 00:05:52.400
there is no possibility for the expression

05:52.400 --> 00:05:54.680
to be singular.

05:54.680 --> 00:05:57.600
So, I guess that if it hadn't been for that bug,

00:05:57.600 --> 00:05:59.320
I would not have found the other items,

05:59.320 --> 00:06:01.033
and we would be left with code that works,

00:06:01.033 --> 00:06:02.033
of course, but that is

00:06:02.033 --> 00:06:06.020
harder to understand, and maintain.

06:06.020 --> 00:06:08.333
Last but not least, a last version of

00:06:08.333 --> 00:06:10.920
"just plainly state what you mean to state".

06:10.920 --> 00:06:14.880
Keep it simple.

NOTE "What did I learn, and how did I learn it?"

06:14.880 --> 00:06:19.267
So first, we have this wonderful CONTRIBUTE file

00:06:19.267 --> 00:06:21.267
that is very explicit about

00:06:21.267 --> 00:06:23.520
how we must proceed when contributing code.

06:23.520 --> 00:06:25.233
So, that's really the first place

00:06:25.233 --> 00:06:27.760
that we should all read.

06:27.760 --> 00:06:29.333
The README file is pretty cool too,

00:06:29.333 --> 00:06:30.967
especially at the beginning of the process,

00:06:30.967 --> 00:06:31.867
when you're not sure whether

00:06:31.867 --> 00:06:36.240
you want to fix that bug or just report it.

NOTE Useful packages

06:36.240 --> 00:06:37.920
And then we've got packages.

06:37.920 --> 00:06:39.900
We've got a number of packages that are really

00:06:39.900 --> 00:06:42.600
helpful when it comes to reading

00:06:42.600 --> 00:06:45.880
the information and the manuals.

06:45.880 --> 00:06:48.000
I'm mentioning three of them here,

00:06:48.000 --> 00:06:53.720
and I think they are the most important for us.

NOTE Package: helpful

06:53.720 --> 00:06:55.600
So "helpful" is on the right,

00:06:55.600 --> 00:06:58.667
and it's overflowing the window with

00:06:58.667 --> 00:07:01.900
all the contextualized information it provides,

00:07:01.900 --> 00:07:05.280
and the standard "help" is on the left.

07:05.280 --> 00:07:07.933
I mean, really there are like two or three

00:07:07.933 --> 00:07:11.567
screen-full of information in the "helpful" output,

00:07:11.567 --> 00:07:13.233
so you really only see a part,

00:07:13.233 --> 00:07:16.320
but I guess if you use it, you know what I'm saying.

07:16.320 --> 00:07:18.867
What I like the most here is the "view in manual"

00:07:18.867 --> 00:07:21.800
part, where you can actually click and even get

00:07:21.800 --> 00:07:23.667
more information that's sometimes

00:07:23.667 --> 00:07:28.400
easier to read and understand.

NOTE Package: inform

07:28.400 --> 00:07:33.640
And then you've got the "info" versus "inform" formats.

07:33.640 --> 00:07:34.567
When you're in the manual,

00:07:34.567 --> 00:07:37.140
"inform" makes a huge difference.

07:37.140 --> 00:07:39.367
You can see here that you've got colorized items,

00:07:39.367 --> 00:07:42.000
and also in the middle you've got that

07:42.000 --> 00:07:45.000
'read' part that's green and bold.

07:45.000 --> 00:07:49.333
In "info" it's not a specific object,

00:07:49.333 --> 00:07:52.200
it's just a string. In 'inform' it's actually

00:07:52.200 --> 00:07:53.800
a link that you can click,

00:07:53.800 --> 00:07:58.320
and actually go to that 'read' manual page.

NOTE Package: which-key

07:58.320 --> 00:08:01.300
Now, we've got "which-key".

08:01.300 --> 00:08:03.400
"which-key" is a savior for beginners too.

08:03.400 --> 00:08:04.867
Just wait half a second or something,

00:08:04.867 --> 00:08:06.500
and Emacs will show you all the keys

00:08:06.500 --> 00:08:08.433
that you can access from the prefix combination

00:08:08.433 --> 00:08:09.920
that you just typed.

08:09.920 --> 00:08:13.200
So, it's really helpful for discovering functions

00:08:13.200 --> 00:08:19.160
and learning new functions, getting used to them.

NOTE It all started with this message…

08:19.160 --> 00:08:21.500
And so that whole process started…,

00:08:21.500 --> 00:08:26.533
it was May 23, 2017,

00:08:26.533 --> 00:08:30.440
with that thread when I found the bug.

08:30.440 --> 00:08:32.800
I just bumped into an English/code bug

00:08:32.800 --> 00:08:36.920
this morning. In package.el, when one package

08:36.920 --> 00:08:39.033
is not needed anymore, the message is:

00:08:39.033 --> 00:08:41.300
"Package menu: Operation finished.

00:08:41.300 --> 00:08:44.880
1 packages are no longer needed", etc.

08:44.880 --> 00:08:49.633
So, I was asking whether we had best practices

00:08:49.633 --> 00:08:53.800
for using messages, and we had a whole thread

08:53.800 --> 00:08:57.867
about that. And while I was discussing on that

00:08:57.867 --> 00:09:01.240
thread, I started that new thread, which is:

09:01.240 --> 00:09:02.867
"package.el strings".

00:09:02.867 --> 00:09:09.900
The whole thing actually ended on June 27, 2018.

00:09:09.900 --> 00:09:15.400
So, a year after, with that message from Noam

00:09:15.400 --> 00:09:18.567
telling me that "Yes I can close the bug,"

00:09:18.567 --> 00:09:22.040
and that was it.

09:22.040 --> 00:09:24.000
So, it took about a year to finish that.

00:09:24.000 --> 00:09:28.133
What I did learn basically is that

00:09:28.133 --> 00:09:32.160
helping with Emacs is not that difficult.

09:32.160 --> 00:09:36.100
It takes time when you're not fluent with the code,

00:09:36.100 --> 00:09:37.100
but that's okay because the reference

09:37.100 --> 00:09:39.300
is excellent, and there are lots of people

00:09:39.300 --> 00:09:41.520
who are here to help.

NOTE Conclusion

09:41.520 --> 00:09:45.700
Basically, the solution to all our problems is

00:09:45.700 --> 00:09:47.733
"Keep It Simple and Straightforward".

00:09:47.733 --> 00:09:51.033
As you can see in that patch,

00:09:51.033 --> 00:09:53.233
even if it's a beginner's patch,

00:09:53.233 --> 00:09:57.733
what I did shows what can be done by Emacs Lisp

00:09:57.733 --> 00:09:59.533
beginners to help with "straightening" the strings

00:09:59.533 --> 00:10:02.267
to reduce the number of potential English bugs.

00:10:02.267 --> 00:10:04.533
And then to make Emacs strings easier

00:10:04.533 --> 00:10:07.233
to be handled by real localization processes one day.

00:10:07.233 --> 00:10:09.067
But it doesn't have to be about strings

00:10:09.067 --> 00:10:12.767
because strings can be an easy entry point to Emacs,

00:10:12.767 --> 00:10:16.720
but it can be any itch that you want to scratch.

10:16.720 --> 00:10:18.267
And my real conclusion is that

00:10:18.267 --> 00:10:22.160
Emacs is free software, and what that means is mostly

10:22.160 --> 00:10:24.067
that it allows you to do things that you would

00:10:24.067 --> 00:10:27.920
never have thought of being able to do before.

10:27.920 --> 00:10:32.000
That's really the biggest lesson to be learned here.

10:32.000 --> 00:10:33.400
So, I want to thank all the people

00:10:33.400 --> 00:10:37.920
who allowed this to be happening, allowed me to

10:37.920 --> 00:10:41.267
learn a bit and contribute a bit to that wonderful

00:10:41.267 --> 00:10:42.800
piece of software that Emacs is.

00:10:42.800 --> 00:10:44.533
And thank you everyone for listening,

00:10:44.533 --> 00:10:46.700
and hopefully I'll see you next year

00:10:46.700 --> 00:10:51.520
with a different translation related presentation.

10:51.520 --> 11:13.640
Thank you very much.