summaryrefslogtreecommitdiffstats
path: root/2022/captions/emacsconf-2022-localizing--prelocalizing-emacs--jeanchristophe-helary--answers.vtt
blob: a6a4ba40a7dd40e16b94373189bd6fef241a25bd (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
WEBVTT

00:00.000 --> 00:09.260
Excellent. Thank you for the great talk. As someone whose first language wasn't English

00:09.260 --> 00:14.960
and speaks other languages, I think localization and internationalization is a very important

00:14.960 --> 00:20.920
topic that's near and dear to my heart, and especially when it comes to Emacs. I think

00:20.920 --> 00:26.700
there's a lot that we could do better. So, yeah, thanks so much. Folks, if you have questions,

00:26.700 --> 00:32.880
you can post them on IRC on the pad, and Jon-Karstof will answer them, and we will also open up

00:32.880 --> 00:37.600
this big blue button for people who would like to join here and ask their questions

00:37.600 --> 00:45.760
directly. Jon-Karstof, please take it away. Okay, thank you. I'm not seeing much activity

00:45.760 --> 00:55.920
on IRC or the pad, so let me add a few things. First, that patch was really interesting in

00:55.920 --> 01:03.680
terms of actually getting into the code and understanding how really can a beginner join

01:03.680 --> 01:11.080
development, even if it's just a few lines. I mentioned in the first part of the presentation

01:11.080 --> 01:17.600
that there was this small integration bug with Mac, and that's the thing that actually

01:17.600 --> 01:22.400
got me started, and that was interesting because at the time I was trying to use Aquamax because

01:22.400 --> 01:28.280
it looked simpler, and I thought, okay, if I need to fix that, rather than fixing it

01:28.280 --> 01:34.400
in Aquamax, maybe I should just go to Emacs and fix it there. So, that was the first attempt

01:34.400 --> 01:40.440
for me to actually contribute something serious, and it was really nice to – I mean, this

01:40.440 --> 01:47.160
Emacs development list is really amazing. 99% of the discussion is just way above your

01:47.160 --> 01:54.120
head, but sometimes you grasp something, and the more you grasp it, the more you understand

01:54.120 --> 02:00.600
and the more you feel like you can actually do something, especially since – I mean,

02:00.600 --> 02:06.640
as for all the free software development projects, most of them, I guess, it's really just do

02:06.640 --> 02:13.920
it kind of thing. And if you try to do something, somebody's going to help you, and what I

02:13.920 --> 02:21.200
really enjoy when being there is that the people are always very nice. Sometimes you

02:21.200 --> 02:28.080
feel some tension when there are discussions about a specific topic, but it's – everybody

02:28.080 --> 02:37.520
is really polite, I mean, 99% of the time. And what I like the most is all the people

02:37.520 --> 02:42.680
are very strong opinionated, so they have a very good idea of what Emacs should be or

02:42.680 --> 02:47.640
should not be, and so it gives you a very good idea of in what direction you should

02:47.640 --> 02:57.400
go. So that experience – I mean, pretty much those 2017, 2018 years were until now

02:57.400 --> 03:02.040
the peak of my Emacs activity. I've had to craddle with that because I was busy with

03:02.040 --> 03:07.160
other things, but I'm really planning to go back to working on maybe not localization

03:07.160 --> 03:13.480
because it's really – it's too big for me right now. And what I was told is that

03:13.480 --> 03:20.520
it involved a bit of C programming and things like this, so I'm not really into that right

03:20.520 --> 03:30.840
now. But I think eventually one day – I just turned 53, so I guess in a few years

03:30.840 --> 03:36.800
from now when I have more time, I guess I'll just dive in and just work on those localization

03:36.800 --> 03:43.800
issues and really to bring Emacs to a different world because I think it's – if we were

03:43.800 --> 03:49.920
able to have – it's a big job. I mean, it's really – if you check the threads

03:49.920 --> 03:55.400
on dev, check my name, you will see that I mostly post on translation or localization

03:55.400 --> 04:01.360
issues at least at the time. And I did an estimate of the sheer volume of strings to

04:01.360 --> 04:10.360
translate. For example, the manuals were about 2 million words. That's big. That's big.

04:10.360 --> 04:14.040
But it's okay. I mean, it's not something that's impossible. And if you check the strings

04:14.040 --> 04:20.160
– that was a really rough estimate. If you check the strings for Emacs proper, not even

04:20.160 --> 04:29.120
talking about the packages and things, I think that would add probably like 500,000 words.

04:29.120 --> 04:34.360
I mean, I have no idea, but my very rough estimate would be that. So it's not something

04:34.360 --> 04:41.120
that's impossible to do. And we'd have to ensure that we have a good process for people

04:41.120 --> 04:46.200
who review the strings and contribute new strings and things like this and also best

04:46.200 --> 04:53.560
practices like what I tried to show in this video. And I was really not trying to be dismissive

04:53.560 --> 04:58.680
about the people who worked on Package L because they did a wonderful job at actually helping

04:58.680 --> 05:02.840
people like me access all those packages. So it's – I mean, the point of the video

05:02.840 --> 05:10.840
is naturally to dismiss the code. But I was kind of scared because I was like, if they

05:10.840 --> 05:18.720
write code like this for strings, then what about the rest of the code? Is it – so it

05:18.720 --> 05:25.560
was kind of – I mean, something that I really can't evaluate. But I'm like – I mean,

05:25.560 --> 05:30.600
those guys obviously are really smart and they're trying to make intelligent things

05:30.600 --> 05:37.400
about how they want to factor their code, et cetera. But if they do that for strings,

05:37.400 --> 05:44.400
which is quite simple actually – I mean, it's simple to mess up strings. So I was

05:44.400 --> 05:50.320
like, what about the rest of the code? Is it that complex or that difficult to understand?

05:50.320 --> 05:56.000
So that's kind of a put off for me. I'm like, I really don't want to try to envisage

05:56.000 --> 06:01.760
that more because – plus it's not – it's really not my area at all. So anyway, that's

06:01.760 --> 06:04.400
what I wanted to add. Yeah.

06:04.400 --> 06:11.680
Awesome. Yeah, I think I pretty much agree with all of what you said.

06:11.680 --> 06:17.360
Yeah, yeah, yeah. I have a question – I see a question on the pad. I use Emacs on

06:17.360 --> 06:23.520
English, but my mother language is – no, no, no. Okay. So the answer is that Emacs

06:23.520 --> 06:33.760
is not localized. And my understanding is that right now it's not localizable. And

06:33.760 --> 06:40.840
those discussions took place about four or five years ago. So check on the dev list and

06:40.840 --> 06:46.280
you'll see the state of the discussion because there is only a discussion at the moment.

06:46.280 --> 06:57.480
What I did for package L, I think it was really just a one-time attempt at fixing one package.

06:57.480 --> 07:05.640
And I did check the other – a number of other packages in core Emacs. And not a lot

07:05.640 --> 07:12.280
of them had – I mean, as far as I checked. And I really did not check everything. But

07:12.280 --> 07:20.840
basically what you have to do is check all the functions that impact strings. And some

07:20.840 --> 07:28.600
are really not user-facing strings, so they're not really interesting for us. And actually,

07:28.600 --> 07:34.640
that's really interesting to do that. So if you just take one list package, list code

07:34.640 --> 07:40.480
and just go through the thing and just check all of print1, printc, message, format, concat

07:40.480 --> 07:43.520
and stuff and just see how it goes.

07:43.520 --> 07:50.240
So basically right now there is no infrastructure to localize the thing. There is no process

07:50.240 --> 07:56.720
to extract the strings. And there is no way to actually import them back into the code.

07:56.720 --> 08:02.800
So what we can do right now is really just what I did, make sure that it's eventually

08:02.800 --> 08:10.760
possible one day. And as I just shown, it's really not such a big deal. If you're very

08:10.760 --> 08:19.800
careful about understanding the way that the strings are handled, it's just a few rewrites

08:19.800 --> 08:24.560
away. I mean, it's really not much. So there's – I mean, there's not a lot to be proud

08:24.560 --> 08:31.140
about in my patch. But it was really fun. And I think it's a very good entry point

08:31.140 --> 08:39.480
for people like us. I suppose – I mean, I suppose the first person question. I mean,

08:39.480 --> 08:44.240
I don't know. Maybe I'm just – I should not suppose that. But people who really enjoy

08:44.240 --> 08:51.320
working in Emacs and just sometimes would like to contribute something and are not programmers

08:51.320 --> 08:56.320
or anything or maybe even programmers. I mean, I'm not excluding them. But that's really

08:56.320 --> 09:02.280
a good way to just start doing something. And eventually from there, you can – I mean,

09:02.280 --> 09:07.020
you just use a package that you like and that you think is important and just check the

09:07.020 --> 09:10.200
strings and do things like this. And then eventually, you'll find other parts of the

09:10.200 --> 09:18.840
code that you want to improve or add functions. So yeah, actually, the patch that I did, this

09:18.840 --> 09:26.840
patch is actually in the process of the thing that I started with Equimax. So I did one

09:26.840 --> 09:35.600
little thing regarding those that were not fully integrated in macOS. And then I did

09:35.600 --> 09:41.880
something about a small function. I think I added the possibility to add an option.

09:41.880 --> 09:48.960
I did documentation improvement as well. So really just little things. And then the deeper

09:48.960 --> 09:53.000
you dive, the more interesting it gets. And then you find something that you really want

09:53.000 --> 10:07.160
to do. So just use that entry point as a way to have fun in Emacs.

10:07.160 --> 10:15.240
Well, so I mentioned Regex on strings. Well, it's not really a red flag for localization.

10:15.240 --> 10:28.080
But the way it's used, I mean, I guess there are ways to properly use it. But I think really

10:28.080 --> 10:38.400
the basically using that means that you're making assumptions on the way language is

10:38.400 --> 10:45.800
structured. And I did exactly the same mistake on a different project that I'm working on.

10:45.800 --> 10:51.280
Actually, I'm in charge of rewriting a manual. And we were using Docbook. And I just thought

10:51.280 --> 10:57.240
it would be smart to have automated links to parts of the chapters, et cetera. And the

10:57.240 --> 11:01.240
thing is that depending on the language, you've got different ways to introduce chapters.

11:01.240 --> 11:10.540
So I should know that. I should know that. You should not automatically insert strings

11:10.540 --> 11:20.720
in code because it's going to produce something that can't be handled by the translator. So

11:20.720 --> 11:28.840
basically Regex on strings is something that probably you might use. But if you see, I

11:28.840 --> 11:33.320
mean, you can see the way it was used in the original code. So if you see something like

11:33.320 --> 11:39.360
that, I mean, just don't run and just fix the thing because there is no way these can

11:39.360 --> 11:44.920
be localized, I mean, extracted properly and then localized. And that's the reason too

11:44.920 --> 11:50.480
why numbers are a big problem because, for example, in English but in French too, we

11:50.480 --> 11:56.920
have only singular forms and plural forms. But some languages have zero forms. Some languages

11:56.920 --> 12:03.720
have two forms like pair forms. Some languages don't have a different form for anything.

12:03.720 --> 12:09.920
For example, I live in Japan. I work in Japanese. And in Japanese, you don't have a form. You

12:09.920 --> 12:16.640
don't have different inflections for words based on their number. So saying one whatever

12:16.640 --> 12:23.400
or two whatevers or an infinity of whatevers or even zero whatever, it's just the same

12:23.400 --> 12:28.480
form. So making assumption on the number of things and the way it's expressed in the language

12:28.480 --> 12:34.640
is usually, and that's something that we already know in free software. I mean, if you check

12:34.640 --> 12:40.060
the getex library, they've got everything sorted out. And that's something that was

12:40.060 --> 12:46.880
created in the 90s at Sun Microsystem. And then it was freed, et cetera. But when you

12:46.880 --> 12:52.560
see the work that it did at the time, you would kind of expect that people understand

12:52.560 --> 12:58.920
that. But no. And that's OK because developers develop and localizers localize. So we kind

12:58.920 --> 13:04.820
of split. But everything has been done already. So we just have to be aware of what's being

13:04.820 --> 13:11.720
done. And we have to be aware of the rules. And I think of one very good set of rules

13:11.720 --> 13:19.880
that's been online for a while. It's the Worldwide Consortium. They have a really good internationalization

13:19.880 --> 13:26.640
page where everything is pretty much black on white on paper, on the web at least. And

13:26.640 --> 13:31.960
if you read that, you can see exactly what should be done for localization, what should

13:31.960 --> 13:35.880
not be done, what should be avoided at all costs, et cetera, et cetera.

13:35.880 --> 13:44.440
So there are plenty of references here and there. And in terms of software localization,

13:44.440 --> 13:49.980
it's the same. If you check the getex page, you should be able to get an idea of what

13:49.980 --> 13:59.240
should be good. So is my project to localize all of Emacs? I wish it were. Eventually I'll

13:59.240 --> 14:05.160
be rich. Hopefully. I don't know. I'm working on that. It's not working well. But the day

14:05.160 --> 14:11.540
I can take just one year off totally and focus on that, I think that's something I would

14:11.540 --> 14:18.760
love to work on and just get up to speed with the process of programming all the things,

14:18.760 --> 14:23.080
checking all the things, and organizing the infrastructure. But seriously, I don't think

14:23.080 --> 14:31.240
that will ever happen because I'm a poor translator. And I still have, what, like 20 years to go

14:31.240 --> 14:40.560
before I can't work anymore. And we don't have savings or anything with the corona shit.

14:40.560 --> 14:47.560
So I don't think that's ever going to happen. But I would love to help. And yes, yes. How

14:47.560 --> 14:53.480
deep would useful localization go? Because the core of Emacs are duck strings and localization.

14:53.480 --> 15:00.280
Yes, yes, yes. I mean, all those discussions have been made. I mean, no conclusion reached.

15:00.280 --> 15:07.880
But we have addressed those things on the discussions. And so just, I mean, it's really

15:07.880 --> 15:13.560
pretentious to say, check my name on the Emacs table list because I've talked about that.

15:13.560 --> 15:18.680
It's really pretentious. But that's not what I'm saying. I mean, there has been a lot of

15:18.680 --> 15:24.400
discussion on the development list. So if you check for localization, translation, stuff

15:24.400 --> 15:30.800
like that, you'll see keywords, and you'll see the discussion. And people are aware of

15:30.800 --> 15:36.440
the issues. So I mean, we just need to have a framework for that.

15:36.440 --> 15:40.120
Thank you. Just to quickly chime in to say, I think we have about two more minutes of

15:40.120 --> 15:45.800
on stream Q&A. And then you're welcome to either stay here, Jean-Christophe, or continue

15:45.800 --> 15:48.800
taking questions on the pad on IRC.

15:48.800 --> 15:57.120
I think, well, I got to go to work. So I need to get ready. But I think, unless we have

15:57.120 --> 16:08.760
something on IRC, I think we're good. If you find something else that I've not addressed,

16:08.760 --> 16:19.840
I'm good. Otherwise, yes, yes, yeah, we need to take all the C code. But I mean, you can

16:19.840 --> 16:29.160
decide the level down to which you want to work. So you can go all the way to the C code.

16:29.160 --> 16:32.920
But actually, the C code is actually easier to extract because there is all these get

16:32.920 --> 16:40.280
text things that works on the C code already. So the issue is pretty much the Emacs Lisp

16:40.280 --> 16:47.760
code, as far as I can understand. So that would be the process that we need to address.

16:47.760 --> 16:56.800
Doc strings, indeed. But then the doc strings and the manual, they are very close. And actually,

16:56.800 --> 17:03.560
yeah, my estimate of the 500,000 word, I think it was based on doc strings. So yeah, we need

17:03.560 --> 17:09.760
to take all that. And that's an ongoing project that's not going to go away anyway. So we'll

17:09.760 --> 17:12.760
be here 10 years from now, I'm sure.

17:12.760 --> 17:17.680
OK, cool. And yeah, I think that's about all the time that we have on the stream. I guess

17:17.680 --> 17:21.720
if folks have further questions, they could maybe reach out to you later on IRC or via

17:21.720 --> 17:22.720
email.

17:22.720 --> 17:29.640
And I'll be back on the development list shortly, maybe six months from now. So yeah, I can

17:29.640 --> 17:30.640
take it from there.

17:30.640 --> 17:31.640
Sounds great.

17:31.640 --> 17:32.640
Thank you very much.

17:32.640 --> 17:33.640
Thank you very much.

17:33.640 --> 17:34.640
Yeah, thanks again for your great talk. Cheers.

17:34.640 --> 17:35.640
Cheers.

17:35.640 --> 17:56.640
OK, bye.