WEBVTT
00:00.000 --> 00:09.260
Excellent. Thank you for the great talk. As someone whose first language wasn't English
00:09.260 --> 00:14.960
and speaks other languages, I think localization and internationalization is a very important
00:14.960 --> 00:20.920
topic that's near and dear to my heart, and especially when it comes to Emacs. I think
00:20.920 --> 00:26.700
there's a lot that we could do better. So, yeah, thanks so much. Folks, if you have questions,
00:26.700 --> 00:32.880
you can post them on IRC on the pad, and Jon-Karstof will answer them, and we will also open up
00:32.880 --> 00:37.600
this big blue button for people who would like to join here and ask their questions
00:37.600 --> 00:45.760
directly. Jon-Karstof, please take it away. Okay, thank you. I'm not seeing much activity
00:45.760 --> 00:55.920
on IRC or the pad, so let me add a few things. First, that patch was really interesting in
00:55.920 --> 01:03.680
terms of actually getting into the code and understanding how really can a beginner join
01:03.680 --> 01:11.080
development, even if it's just a few lines. I mentioned in the first part of the presentation
01:11.080 --> 01:17.600
that there was this small integration bug with Mac, and that's the thing that actually
01:17.600 --> 01:22.400
got me started, and that was interesting because at the time I was trying to use Aquamax because
01:22.400 --> 01:28.280
it looked simpler, and I thought, okay, if I need to fix that, rather than fixing it
01:28.280 --> 01:34.400
in Aquamax, maybe I should just go to Emacs and fix it there. So, that was the first attempt
01:34.400 --> 01:40.440
for me to actually contribute something serious, and it was really nice to – I mean, this
01:40.440 --> 01:47.160
Emacs development list is really amazing. 99% of the discussion is just way above your
01:47.160 --> 01:54.120
head, but sometimes you grasp something, and the more you grasp it, the more you understand
01:54.120 --> 02:00.600
and the more you feel like you can actually do something, especially since – I mean,
02:00.600 --> 02:06.640
as for all the free software development projects, most of them, I guess, it's really just do
02:06.640 --> 02:13.920
it kind of thing. And if you try to do something, somebody's going to help you, and what I
02:13.920 --> 02:21.200
really enjoy when being there is that the people are always very nice. Sometimes you
02:21.200 --> 02:28.080
feel some tension when there are discussions about a specific topic, but it's – everybody
02:28.080 --> 02:37.520
is really polite, I mean, 99% of the time. And what I like the most is all the people
02:37.520 --> 02:42.680
are very strong opinionated, so they have a very good idea of what Emacs should be or
02:42.680 --> 02:47.640
should not be, and so it gives you a very good idea of in what direction you should
02:47.640 --> 02:57.400
go. So that experience – I mean, pretty much those 2017, 2018 years were until now
02:57.400 --> 03:02.040
the peak of my Emacs activity. I've had to craddle with that because I was busy with
03:02.040 --> 03:07.160
other things, but I'm really planning to go back to working on maybe not localization
03:07.160 --> 03:13.480
because it's really – it's too big for me right now. And what I was told is that
03:13.480 --> 03:20.520
it involved a bit of C programming and things like this, so I'm not really into that right
03:20.520 --> 03:30.840
now. But I think eventually one day – I just turned 53, so I guess in a few years
03:30.840 --> 03:36.800
from now when I have more time, I guess I'll just dive in and just work on those localization
03:36.800 --> 03:43.800
issues and really to bring Emacs to a different world because I think it's – if we were
03:43.800 --> 03:49.920
able to have – it's a big job. I mean, it's really – if you check the threads
03:49.920 --> 03:55.400
on dev, check my name, you will see that I mostly post on translation or localization
03:55.400 --> 04:01.360
issues at least at the time. And I did an estimate of the sheer volume of strings to
04:01.360 --> 04:10.360
translate. For example, the manuals were about 2 million words. That's big. That's big.
04:10.360 --> 04:14.040
But it's okay. I mean, it's not something that's impossible. And if you check the strings
04:14.040 --> 04:20.160
– that was a really rough estimate. If you check the strings for Emacs proper, not even
04:20.160 --> 04:29.120
talking about the packages and things, I think that would add probably like 500,000 words.
04:29.120 --> 04:34.360
I mean, I have no idea, but my very rough estimate would be that. So it's not something
04:34.360 --> 04:41.120
that's impossible to do. And we'd have to ensure that we have a good process for people
04:41.120 --> 04:46.200
who review the strings and contribute new strings and things like this and also best
04:46.200 --> 04:53.560
practices like what I tried to show in this video. And I was really not trying to be dismissive
04:53.560 --> 04:58.680
about the people who worked on Package L because they did a wonderful job at actually helping
04:58.680 --> 05:02.840
people like me access all those packages. So it's – I mean, the point of the video
05:02.840 --> 05:10.840
is naturally to dismiss the code. But I was kind of scared because I was like, if they
05:10.840 --> 05:18.720
write code like this for strings, then what about the rest of the code? Is it – so it
05:18.720 --> 05:25.560
was kind of – I mean, something that I really can't evaluate. But I'm like – I mean,
05:25.560 --> 05:30.600
those guys obviously are really smart and they're trying to make intelligent things
05:30.600 --> 05:37.400
about how they want to factor their code, et cetera. But if they do that for strings,
05:37.400 --> 05:44.400
which is quite simple actually – I mean, it's simple to mess up strings. So I was
05:44.400 --> 05:50.320
like, what about the rest of the code? Is it that complex or that difficult to understand?
05:50.320 --> 05:56.000
So that's kind of a put off for me. I'm like, I really don't want to try to envisage
05:56.000 --> 06:01.760
that more because – plus it's not – it's really not my area at all. So anyway, that's
06:01.760 --> 06:04.400
what I wanted to add. Yeah.
06:04.400 --> 06:11.680
Awesome. Yeah, I think I pretty much agree with all of what you said.
06:11.680 --> 06:17.360
Yeah, yeah, yeah. I have a question – I see a question on the pad. I use Emacs on
06:17.360 --> 06:23.520
English, but my mother language is – no, no, no. Okay. So the answer is that Emacs
06:23.520 --> 06:33.760
is not localized. And my understanding is that right now it's not localizable. And
06:33.760 --> 06:40.840
those discussions took place about four or five years ago. So check on the dev list and
06:40.840 --> 06:46.280
you'll see the state of the discussion because there is only a discussion at the moment.
06:46.280 --> 06:57.480
What I did for package L, I think it was really just a one-time attempt at fixing one package.
06:57.480 --> 07:05.640
And I did check the other – a number of other packages in core Emacs. And not a lot
07:05.640 --> 07:12.280
of them had – I mean, as far as I checked. And I really did not check everything. But
07:12.280 --> 07:20.840
basically what you have to do is check all the functions that impact strings. And some
07:20.840 --> 07:28.600
are really not user-facing strings, so they're not really interesting for us. And actually,
07:28.600 --> 07:34.640
that's really interesting to do that. So if you just take one list package, list code
07:34.640 --> 07:40.480
and just go through the thing and just check all of print1, printc, message, format, concat
07:40.480 --> 07:43.520
and stuff and just see how it goes.
07:43.520 --> 07:50.240
So basically right now there is no infrastructure to localize the thing. There is no process
07:50.240 --> 07:56.720
to extract the strings. And there is no way to actually import them back into the code.
07:56.720 --> 08:02.800
So what we can do right now is really just what I did, make sure that it's eventually
08:02.800 --> 08:10.760
possible one day. And as I just shown, it's really not such a big deal. If you're very
08:10.760 --> 08:19.800
careful about understanding the way that the strings are handled, it's just a few rewrites
08:19.800 --> 08:24.560
away. I mean, it's really not much. So there's – I mean, there's not a lot to be proud
08:24.560 --> 08:31.140
about in my patch. But it was really fun. And I think it's a very good entry point
08:31.140 --> 08:39.480
for people like us. I suppose – I mean, I suppose the first person question. I mean,
08:39.480 --> 08:44.240
I don't know. Maybe I'm just – I should not suppose that. But people who really enjoy
08:44.240 --> 08:51.320
working in Emacs and just sometimes would like to contribute something and are not programmers
08:51.320 --> 08:56.320
or anything or maybe even programmers. I mean, I'm not excluding them. But that's really
08:56.320 --> 09:02.280
a good way to just start doing something. And eventually from there, you can – I mean,
09:02.280 --> 09:07.020
you just use a package that you like and that you think is important and just check the
09:07.020 --> 09:10.200
strings and do things like this. And then eventually, you'll find other parts of the
09:10.200 --> 09:18.840
code that you want to improve or add functions. So yeah, actually, the patch that I did, this
09:18.840 --> 09:26.840
patch is actually in the process of the thing that I started with Equimax. So I did one
09:26.840 --> 09:35.600
little thing regarding those that were not fully integrated in macOS. And then I did
09:35.600 --> 09:41.880
something about a small function. I think I added the possibility to add an option.
09:41.880 --> 09:48.960
I did documentation improvement as well. So really just little things. And then the deeper
09:48.960 --> 09:53.000
you dive, the more interesting it gets. And then you find something that you really want
09:53.000 --> 10:07.160
to do. So just use that entry point as a way to have fun in Emacs.
10:07.160 --> 10:15.240
Well, so I mentioned Regex on strings. Well, it's not really a red flag for localization.
10:15.240 --> 10:28.080
But the way it's used, I mean, I guess there are ways to properly use it. But I think really
10:28.080 --> 10:38.400
the basically using that means that you're making assumptions on the way language is
10:38.400 --> 10:45.800
structured. And I did exactly the same mistake on a different project that I'm working on.
10:45.800 --> 10:51.280
Actually, I'm in charge of rewriting a manual. And we were using Docbook. And I just thought
10:51.280 --> 10:57.240
it would be smart to have automated links to parts of the chapters, et cetera. And the
10:57.240 --> 11:01.240
thing is that depending on the language, you've got different ways to introduce chapters.
11:01.240 --> 11:10.540
So I should know that. I should know that. You should not automatically insert strings
11:10.540 --> 11:20.720
in code because it's going to produce something that can't be handled by the translator. So
11:20.720 --> 11:28.840
basically Regex on strings is something that probably you might use. But if you see, I
11:28.840 --> 11:33.320
mean, you can see the way it was used in the original code. So if you see something like
11:33.320 --> 11:39.360
that, I mean, just don't run and just fix the thing because there is no way these can
11:39.360 --> 11:44.920
be localized, I mean, extracted properly and then localized. And that's the reason too
11:44.920 --> 11:50.480
why numbers are a big problem because, for example, in English but in French too, we
11:50.480 --> 11:56.920
have only singular forms and plural forms. But some languages have zero forms. Some languages
11:56.920 --> 12:03.720
have two forms like pair forms. Some languages don't have a different form for anything.
12:03.720 --> 12:09.920
For example, I live in Japan. I work in Japanese. And in Japanese, you don't have a form. You
12:09.920 --> 12:16.640
don't have different inflections for words based on their number. So saying one whatever
12:16.640 --> 12:23.400
or two whatevers or an infinity of whatevers or even zero whatever, it's just the same
12:23.400 --> 12:28.480
form. So making assumption on the number of things and the way it's expressed in the language
12:28.480 --> 12:34.640
is usually, and that's something that we already know in free software. I mean, if you check
12:34.640 --> 12:40.060
the getex library, they've got everything sorted out. And that's something that was
12:40.060 --> 12:46.880
created in the 90s at Sun Microsystem. And then it was freed, et cetera. But when you
12:46.880 --> 12:52.560
see the work that it did at the time, you would kind of expect that people understand
12:52.560 --> 12:58.920
that. But no. And that's OK because developers develop and localizers localize. So we kind
12:58.920 --> 13:04.820
of split. But everything has been done already. So we just have to be aware of what's being
13:04.820 --> 13:11.720
done. And we have to be aware of the rules. And I think of one very good set of rules
13:11.720 --> 13:19.880
that's been online for a while. It's the Worldwide Consortium. They have a really good internationalization
13:19.880 --> 13:26.640
page where everything is pretty much black on white on paper, on the web at least. And
13:26.640 --> 13:31.960
if you read that, you can see exactly what should be done for localization, what should
13:31.960 --> 13:35.880
not be done, what should be avoided at all costs, et cetera, et cetera.
13:35.880 --> 13:44.440
So there are plenty of references here and there. And in terms of software localization,
13:44.440 --> 13:49.980
it's the same. If you check the getex page, you should be able to get an idea of what
13:49.980 --> 13:59.240
should be good. So is my project to localize all of Emacs? I wish it were. Eventually I'll
13:59.240 --> 14:05.160
be rich. Hopefully. I don't know. I'm working on that. It's not working well. But the day
14:05.160 --> 14:11.540
I can take just one year off totally and focus on that, I think that's something I would
14:11.540 --> 14:18.760
love to work on and just get up to speed with the process of programming all the things,
14:18.760 --> 14:23.080
checking all the things, and organizing the infrastructure. But seriously, I don't think
14:23.080 --> 14:31.240
that will ever happen because I'm a poor translator. And I still have, what, like 20 years to go
14:31.240 --> 14:40.560
before I can't work anymore. And we don't have savings or anything with the corona shit.
14:40.560 --> 14:47.560
So I don't think that's ever going to happen. But I would love to help. And yes, yes. How
14:47.560 --> 14:53.480
deep would useful localization go? Because the core of Emacs are duck strings and localization.
14:53.480 --> 15:00.280
Yes, yes, yes. I mean, all those discussions have been made. I mean, no conclusion reached.
15:00.280 --> 15:07.880
But we have addressed those things on the discussions. And so just, I mean, it's really
15:07.880 --> 15:13.560
pretentious to say, check my name on the Emacs table list because I've talked about that.
15:13.560 --> 15:18.680
It's really pretentious. But that's not what I'm saying. I mean, there has been a lot of
15:18.680 --> 15:24.400
discussion on the development list. So if you check for localization, translation, stuff
15:24.400 --> 15:30.800
like that, you'll see keywords, and you'll see the discussion. And people are aware of
15:30.800 --> 15:36.440
the issues. So I mean, we just need to have a framework for that.
15:36.440 --> 15:40.120
Thank you. Just to quickly chime in to say, I think we have about two more minutes of
15:40.120 --> 15:45.800
on stream Q&A. And then you're welcome to either stay here, Jean-Christophe, or continue
15:45.800 --> 15:48.800
taking questions on the pad on IRC.
15:48.800 --> 15:57.120
I think, well, I got to go to work. So I need to get ready. But I think, unless we have
15:57.120 --> 16:08.760
something on IRC, I think we're good. If you find something else that I've not addressed,
16:08.760 --> 16:19.840
I'm good. Otherwise, yes, yes, yeah, we need to take all the C code. But I mean, you can
16:19.840 --> 16:29.160
decide the level down to which you want to work. So you can go all the way to the C code.
16:29.160 --> 16:32.920
But actually, the C code is actually easier to extract because there is all these get
16:32.920 --> 16:40.280
text things that works on the C code already. So the issue is pretty much the Emacs Lisp
16:40.280 --> 16:47.760
code, as far as I can understand. So that would be the process that we need to address.
16:47.760 --> 16:56.800
Doc strings, indeed. But then the doc strings and the manual, they are very close. And actually,
16:56.800 --> 17:03.560
yeah, my estimate of the 500,000 word, I think it was based on doc strings. So yeah, we need
17:03.560 --> 17:09.760
to take all that. And that's an ongoing project that's not going to go away anyway. So we'll
17:09.760 --> 17:12.760
be here 10 years from now, I'm sure.
17:12.760 --> 17:17.680
OK, cool. And yeah, I think that's about all the time that we have on the stream. I guess
17:17.680 --> 17:21.720
if folks have further questions, they could maybe reach out to you later on IRC or via
17:21.720 --> 17:22.720
email.
17:22.720 --> 17:29.640
And I'll be back on the development list shortly, maybe six months from now. So yeah, I can
17:29.640 --> 17:30.640
take it from there.
17:30.640 --> 17:31.640
Sounds great.
17:31.640 --> 17:32.640
Thank you very much.
17:32.640 --> 17:33.640
Thank you very much.
17:33.640 --> 17:34.640
Yeah, thanks again for your great talk. Cheers.
17:34.640 --> 17:35.640
Cheers.
17:35.640 --> 17:56.640
OK, bye.